Post on 11-Jan-2017
transcript
container-solutions.com | @containersoluti | info@container-solutions.com
Distributed Tracing with ZipKin &Kubernetes Maximilian Schöfmann@schoefmann
Container Solutions AG@containersoluti
container-solutions.com | @containersoluti | info@container-solutions.com
Microservices...
In short, the microservice architectural style is an approach to develop a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery. There is a bare minimum of centralized management of these services, which may be written in different programming languages and use different data storage technologies.
-- James Lewis and Martin Fowler
www.container-solutions.com | info@container-solutions.com
The “Socks Shop”...
microservices-demo.github.io
www.container-solutions.com | info@container-solutions.com
The “Socks Shop”...
www.container-solutions.com | info@container-solutions.com
Let’s install it in the cluster...
Distributed Tracing | container-solutions.com
Microservice benefits● Modeling after business domains
● Independent deployment
● Technology diversity
Distributed Tracing | container-solutions.com
Microservice costs● Distribution
● Eventual Consistency
● Operational complexity
Distributed Tracing | container-solutions.com
Microservice requirements● Rapid provisioning
● Monitoring
● Rapid deployment
● Autonomous teams
Distributed Tracing | container-solutions.com
Microservice architectures● Monolithic to microservice architecture
● Apps as a collection of distributed services
● Tools becoming necessary to gather metrics
Distributed Tracing | container-solutions.com
Why distributed tracing?
Example: Google search query● Multiple index lookups● Selecting Ads● Check spelling● Personalise results● Filter DMCA takedowns ● Include relevant images...● ...and videos● ...and news● ...
Distributed Tracing | container-solutions.com
Why distributed tracing?“Per-process logging and metric monitoring have their place, but neither can reconstruct the elaborate journeys that transactions take as they propagate across a distributed system. Distributed traces are these journeys.”
-- Chris Aniszczyk, Cloud Native Computing Foundation
Distributed Tracing | container-solutions.com
Fundamental requirements to make it work
● Ubiquitous deployment
● Continuous monitoring
See also: “Dapper, a Large-Scale Distributed Systems Tracing Infrastructure”http://research.google.com/pubs/pub36356.html (2010)
Distributed Tracing | container-solutions.com
Requirements to make is useful● Low overhead
● Application-level transparency
● Scalability
● (Timely) data availability
Distributed Tracing | container-solutions.com
A distributed trace...
“A tracing infrastructure for distributed
services needs to record information
about all the work done in a system, on
behalf of a given initiator”
Distributed Tracing | container-solutions.com
Data aggregation
Message record:
Record = Message identifier + timestamped event
Data aggregation classes:
● Black box
● Annotation-based
Distributed Tracing | container-solutions.com
● Trace as a tree of nested calls
● Trace trees and spans
Trace data model
www.container-solutions.com | info@container-solutions.com
SpanLogged event in a typical span
● Span name● Span start time● Span end time● Trace id● Span id● Span parent id● Any timing information recorded by the instrumentation library (RPC, HTTP)● Additional custom labels (“foo”)
www.container-solutions.com | info@container-solutions.com
OpenTracing & ZipKinCommon libraries for several programming languages
➔ Libraries attach a trace context to the thread local storage
➔ RPC friendly (specially when using gRPC)
➔ The data is language-independent
opentracing.io zipkin.io
www.container-solutions.com | info@container-solutions.com
Let’s install that also in the cluster...
www.container-solutions.com | info@container-solutions.com
Supported languages➔ Javascript
➔ Python
➔ Java
➔ Scala
➔ Ruby
➔ C#
➔ Golang
www.container-solutions.com | info@container-solutions.com
Supported frameworks➔ Express (nodejs - http)➔ Jersey, RestEasy, JAXRS2, Apache HttpClient, Mysql (Java, HTTP, gRPC)➔ HDFS, HBASE➔ Spring, Spring Cloud➔ Apache Cassandra ➔ Finagle➔ Rack➔ Golang Context➔ GoKit➔ Akka, Spray, Play➔ Dropwizard➔ Roll your own
www.container-solutions.com | info@container-solutions.com
Opentracing Example (Go)
explicitely instrumenting a SQL query in a service written in Go
www.container-solutions.com | info@container-solutions.com
Annotations● Arbitrary text
● Key/Value pairs
➔Can be used for common vocabulary, e.g. “http.status_code”, “peer.service”, “sampling.priority”
www.container-solutions.com | info@container-solutions.com
Architecture (ZipKin with Scribe + Cassandra)
www.container-solutions.com | info@container-solutions.com
Performance
Low overhead is the key!
Sampling is the solution!
… at least partially...
www.container-solutions.com | info@container-solutions.com
Sampling➔ 2-stage sampling:
a. Client: Don’t send every trace instrumented
● limits client-side CPU and bandwidth overhead
● adjustable per service, hard to change in one go
b. Server: Don’t persist every trace received
● limits server-side IO and data volume overhead
● adjustable centrally with simple config change
➔ Adaptive sampling to trade off overhead against missing relevant traces
www.container-solutions.com | info@container-solutions.com
But what about...
● Proprietary services?
● Ancient/Legacy Services?
● 3rd-Party services outside your control?
www.container-solutions.com | info@container-solutions.com
But what about...
● Proprietary services?
● Ancient/Legacy Services?
● 3rd-Party services outside your control?
Proxying!
www.container-solutions.com | info@container-solutions.com
Linkerd overview
● Intelligent, adaptive load-balancing
● Global, fine-grained instrumentation
● Application-centric naming
● Powerful traffic routing mechanisms
www.container-solutions.com | info@container-solutions.com
Some of the answered questions......with a distributed tracing system are:
● Which parts of my system are slow?● Which call pattern can be optimized with parallelization?● Which calls are redundant?● Which routes are affected by this failing part?● Under which circumstances is it failing?● How often is it failing?● Detect queries issued to read and write masters,
instead of read only replicas
www.container-solutions.com | info@container-solutions.com
A word of caution about distributed tracing● Documentation is still rather poor
● Yet another moving part
● Can accumulate huge amounts of data
● Metrics need to be interpreted
● Commercial APM solutions might be an easier route for your use case...
www.container-solutions.com | info@container-solutions.com
A word of caution about distributed tracing● Documentation is still rather poor
● Yet another moving part
● Can accumulate huge amounts of data
● Metrics need to be interpreted
● Commercial APM solutions might be an easier route for your use case...
www.container-solutions.com | info@container-solutions.com
Demo time...
www.container-solutions.com | info@container-solutions.com
www.container-solutions.com | info@container-solutions.com
Questions? Want to learn more?
● Come to our 2 day tinyurl.com/microservice-workshop
(November 8. + 9. or at your company on request)
● Follow us on Twitter: @containersoluti
● Read more on our blog: container-solutions.com/blog
● Or just get in touch: info@container-solutions.com