Distributed Tracing with OpenTracing, ZipKin and Kubernetes

Post on 11-Jan-2017

383 views 1 download

transcript

container-solutions.com | @containersoluti | info@container-solutions.com

Distributed Tracing with ZipKin &Kubernetes Maximilian Schöfmann@schoefmann

Container Solutions AG@containersoluti

container-solutions.com | @containersoluti | info@container-solutions.com

Microservices...

In short, the microservice architectural style is an approach to develop a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery. There is a bare minimum of centralized management of these services, which may be written in different programming languages and use different data storage technologies.

-- James Lewis and Martin Fowler

www.container-solutions.com | info@container-solutions.com

The “Socks Shop”...

microservices-demo.github.io

www.container-solutions.com | info@container-solutions.com

The “Socks Shop”...

www.container-solutions.com | info@container-solutions.com

Let’s install it in the cluster...

Distributed Tracing | container-solutions.com

Microservice benefits● Modeling after business domains

● Independent deployment

● Technology diversity

Distributed Tracing | container-solutions.com

Microservice costs● Distribution

● Eventual Consistency

● Operational complexity

Distributed Tracing | container-solutions.com

Microservice requirements● Rapid provisioning

● Monitoring

● Rapid deployment

● Autonomous teams

Distributed Tracing | container-solutions.com

Microservice architectures● Monolithic to microservice architecture

● Apps as a collection of distributed services

● Tools becoming necessary to gather metrics

Distributed Tracing | container-solutions.com

Why distributed tracing?

Example: Google search query● Multiple index lookups● Selecting Ads● Check spelling● Personalise results● Filter DMCA takedowns ● Include relevant images...● ...and videos● ...and news● ...

Distributed Tracing | container-solutions.com

Why distributed tracing?“Per-process logging and metric monitoring have their place, but neither can reconstruct the elaborate journeys that transactions take as they propagate across a distributed system. Distributed traces are these journeys.”

-- Chris Aniszczyk, Cloud Native Computing Foundation

Distributed Tracing | container-solutions.com

Fundamental requirements to make it work

● Ubiquitous deployment

● Continuous monitoring

See also: “Dapper, a Large-Scale Distributed Systems Tracing Infrastructure”http://research.google.com/pubs/pub36356.html (2010)

Distributed Tracing | container-solutions.com

Requirements to make is useful● Low overhead

● Application-level transparency

● Scalability

● (Timely) data availability

Distributed Tracing | container-solutions.com

A distributed trace...

“A tracing infrastructure for distributed

services needs to record information

about all the work done in a system, on

behalf of a given initiator”

Distributed Tracing | container-solutions.com

Data aggregation

Message record:

Record = Message identifier + timestamped event

Data aggregation classes:

● Black box

● Annotation-based

Distributed Tracing | container-solutions.com

● Trace as a tree of nested calls

● Trace trees and spans

Trace data model

www.container-solutions.com | info@container-solutions.com

SpanLogged event in a typical span

● Span name● Span start time● Span end time● Trace id● Span id● Span parent id● Any timing information recorded by the instrumentation library (RPC, HTTP)● Additional custom labels (“foo”)

www.container-solutions.com | info@container-solutions.com

OpenTracing & ZipKinCommon libraries for several programming languages

➔ Libraries attach a trace context to the thread local storage

➔ RPC friendly (specially when using gRPC)

➔ The data is language-independent

opentracing.io zipkin.io

www.container-solutions.com | info@container-solutions.com

Let’s install that also in the cluster...

www.container-solutions.com | info@container-solutions.com

Supported languages➔ Javascript

➔ Python

➔ Java

➔ Scala

➔ Ruby

➔ C#

➔ Golang

www.container-solutions.com | info@container-solutions.com

Supported frameworks➔ Express (nodejs - http)➔ Jersey, RestEasy, JAXRS2, Apache HttpClient, Mysql (Java, HTTP, gRPC)➔ HDFS, HBASE➔ Spring, Spring Cloud➔ Apache Cassandra ➔ Finagle➔ Rack➔ Golang Context➔ GoKit➔ Akka, Spray, Play➔ Dropwizard➔ Roll your own

www.container-solutions.com | info@container-solutions.com

Opentracing Example (Go)

explicitely instrumenting a SQL query in a service written in Go

www.container-solutions.com | info@container-solutions.com

Annotations● Arbitrary text

● Key/Value pairs

➔Can be used for common vocabulary, e.g. “http.status_code”, “peer.service”, “sampling.priority”

www.container-solutions.com | info@container-solutions.com

Architecture (ZipKin with Scribe + Cassandra)

www.container-solutions.com | info@container-solutions.com

Performance

Low overhead is the key!

Sampling is the solution!

… at least partially...

www.container-solutions.com | info@container-solutions.com

Sampling➔ 2-stage sampling:

a. Client: Don’t send every trace instrumented

● limits client-side CPU and bandwidth overhead

● adjustable per service, hard to change in one go

b. Server: Don’t persist every trace received

● limits server-side IO and data volume overhead

● adjustable centrally with simple config change

➔ Adaptive sampling to trade off overhead against missing relevant traces

www.container-solutions.com | info@container-solutions.com

But what about...

● Proprietary services?

● Ancient/Legacy Services?

● 3rd-Party services outside your control?

www.container-solutions.com | info@container-solutions.com

But what about...

● Proprietary services?

● Ancient/Legacy Services?

● 3rd-Party services outside your control?

Proxying!

www.container-solutions.com | info@container-solutions.com

Linkerd overview

● Intelligent, adaptive load-balancing

● Global, fine-grained instrumentation

● Application-centric naming

● Powerful traffic routing mechanisms

www.container-solutions.com | info@container-solutions.com

Some of the answered questions......with a distributed tracing system are:

● Which parts of my system are slow?● Which call pattern can be optimized with parallelization?● Which calls are redundant?● Which routes are affected by this failing part?● Under which circumstances is it failing?● How often is it failing?● Detect queries issued to read and write masters,

instead of read only replicas

www.container-solutions.com | info@container-solutions.com

A word of caution about distributed tracing● Documentation is still rather poor

● Yet another moving part

● Can accumulate huge amounts of data

● Metrics need to be interpreted

● Commercial APM solutions might be an easier route for your use case...

www.container-solutions.com | info@container-solutions.com

A word of caution about distributed tracing● Documentation is still rather poor

● Yet another moving part

● Can accumulate huge amounts of data

● Metrics need to be interpreted

● Commercial APM solutions might be an easier route for your use case...

www.container-solutions.com | info@container-solutions.com

Demo time...

www.container-solutions.com | info@container-solutions.com

www.container-solutions.com | info@container-solutions.com

Questions? Want to learn more?

● Come to our 2 day tinyurl.com/microservice-workshop

(November 8. + 9. or at your company on request)

● Follow us on Twitter: @containersoluti

● Read more on our blog: container-solutions.com/blog

● Or just get in touch: info@container-solutions.com