Post on 16-Oct-2020
transcript
Tracingfor Java DevelopersPhilipp Krenn@xeraa
@xeraa
Developer !
@xeraa
LogsUsed by everyone?
@xeraa
Logs !
EventsStructure & context
@xeraa
{ "@timestamp": "2019-11-20T16:53:01.746Z", "log.level": "INFO", "message": "[philipp] failed to log in with password [***]", "service.name": "gs-securing-web", "process.thread.name": "http-nio-8080-exec-8", "log.logger": "hello.AuthenticationEventListener", "labels.event.category": "LOGIN_FAILURE", "labels.user.name": "philipp", "labels.source.ip": "0:0:0:0:0:0:0:1", "labels.url.full": "/login"}
@xeraa
MetricsUsed by most?
@xeraa
MetricsSLI, SLA, SLO
Four golden signals:latency, traffic, errors,
saturation
@xeraa
UptimeUsed by many?
@xeraa
UptimeSynthetic / (pro)active
monitoring
@xeraa
TracesUsed by some?
@xeraa
TracesApplication Performance
MonitoringDistributed Tracing
@xeraa
...the monitoring and management of performance
and availability of software applications.
— Wikipedia
@xeraa
Simpler Times
https://www.kartar.net/2019/07/intro-to-distributed-tracing/
@xeraa
Better Times for Vendors
https://www.kartar.net/2019/07/intro-to-distributed-tracing/
@xeraa
GoalsTransaction context
Reconstruct flowQuery & visualize transcations
@xeraa
AgentsLanguage & framework specificDetect start & end of request,
capture errors
@xeraa
AgentsWrap operations in standard &
known 3rd party librariesExtract additional information
@xeraa
AgentsLittle to no overhead
Trace & hook, not profile
@xeraa
AgentsLive in / attach to app process
public static void mainpremain
agentmain
https://www.javaadvent.com/2019/12/a-beginners-guide-to-java-agents.html
@xeraa
premain
package sample;public class SimpleAgent<?> { public static void premain(String argument) { System.out.println("Hello " + argument); }}
Premain-Class: sample.SimpleAgent
java -javaagent:/opt/agent.jar=World some.Program
@xeraa
Instrumentation API
package sample;public class ClassLoadingAgent { public static void premain(String argument, Instrumentation instrumentation) { instrumentation.addTransformer(new ClassFileTransformer() { @Override public byte[] transform(Module module, ClassLoader loader, String name, Class<?> typeIfLoaded, ProtectionDomain domain, byte[] bu!er) { System.out.println("Class was loaded: " + name); return null; } }); }}
@xeraa
ByteBuddy
package sample;public class ByteBuddySampleAgent { public static void premain(String argument, Instrumentation instrumentation) { new AgentBuilder.Default() .type(ElementMatchers.any()) .transform((DynamicType.Builder<?> builder, TypeDescription type, ClassLoader loader, JavaModule module) -> { System.out.println("Class was loaded: " + name); return builder; }).installOn(instrumentation); }}
@xeraa
Measuring Time
public class TimeMeasurementAdvice { @Advice.OnMethodEnter public static long enter() { return System.currentTimeMillis(); } @Advice.OnMethodExit(onThrowable = Throwable.class) public static void exit(@Advice.Enter long start, @Advice.Origin String origin) { long executionTime = System.currentTimeMillis() - start; System.out.println(origin + " took " + executionTime + " to execute"); }}
@xeraa
Measure Time through ByteBuddy
package sample;public class ByteBuddyTimeMeasuringAgent { public static void premain(String argument, Instrumentation instrumentation) { Advice advice = Advice.to(TimeMeasurementAdvice.class); new AgentBuilder.Default() .type(ElementMatchers.isAnnotatedBy(MeasureTime.class)) .transform((DynamicType.Builder<?> builder, TypeDescription type, ClassLoader loader, JavaModule module) -> { return builder.visit(advice.on(ElementMatchers.isMethod()); }).installOn(instrumentation); }}
@xeraa
OpenTracingPromises vendor-neutral APIs
for tracingScheduled for sunsetting
@xeraa
Distributed TracingTrace ID propagation
traceparent:00- // Version0af7651916cd43dd8448eb211c80319c- // Trace IDb7ad6b7169203331- // Parent span ID01 // Flags (sampling)
@xeraa
OpenTelemetrySupersedes OpenTracing +
OpenCensusBackwards compatible with
both where possible
@xeraa
OpenTelemetryPluggable collector process
exporter instead of rebuilding your app
@xeraa
OpenTracing marketed itself as a standard from the onset, never even released a 1.0 version of its java impl before canceling the project. OpenCensus also never made 1.0 in its years before canceling itself.
OpenTelemetery has in all its time produced a single 0.2 release
— https://twitter.com/adrianfcole/status/1223778238469566464
@xeraa
SamplingRepresentative subset
Random or "interesting" traces
@xeraa
Elastic's Current Sampling StrategySingle service: Head-based /
randomDistributed: First service
chooses@xeraa
APM ServerSeparate process for receiving
traces from agentsTransforms and enriches data
@xeraa
Elastic's APM ServerWritten in Go, using the Beats
frameworkOptional authentication & rate
limiting
@xeraa
Automatic InstrumentationMinimal app modification
@xeraa
— .NET:
app.UseElasticApm(Configuration);
— Node.js:
const apm = require('elastic-apm-node').start()
— Python:
apm = elasticapm.instrument()
— Ruby on Rails:
config.elastic_apm.service_name = 'MyApp'
@xeraa
Automatic InstrumentationJava
java -javaagent:/app/elastic-apm-agent.jar -jar /app/app.jar
@xeraa
Automatic InstrumentationConfiguration
environment: - ELASTIC_APM_SERVICE_NAME=${ELASTIC_APM_SERVICE_NAME:-my-app} - ELASTIC_APM_SERVER_URL=${ELASTIC_APM_SERVER_URL:-http://apm-server:8200} - ELASTIC_APM_APPLICATION_PACKAGES=net.xeraa.my-app - ELASTIC_APM_ENABLE_LOG_CORRELATION=true - ELASTIC_APM_ENVIRONMENT=production
@xeraa
Supported Technologies
https://www.elastic.co/guide/en/apm/agent/java/1.x/supported-technologies-details.html
@xeraa
More TricksInferring spans with async-
profiler (20ms default)Avoid safepoint bias and stop-the-world pauses compared to ThreadMXBean#getThreadInfo,
Thread#getStackTrace(),...https://psy-lob-saw.blogspot.com/2016/02/why-most-sampling-java-profilers-are.html
@xeraa
PS: KubernetesInjection through initContainers
https://www.elastic.co/blog/using-elastic-apm-java-agent-on-kubernetes-k8s
@xeraa
Manual Instrumentation
Built-in instrumentation supportWrap code manually
@xeraa
Golanghttp.Server{ Handler: myHandler,}
!
import "go.elastic.co/apm/module/apmhttp"http.Server{ Handler: apmhttp.Wrap(myHandler),}
@xeraa
Manual Instrumentation
<dependencies> <dependency> <groupId>co.elastic.apm</groupId> <artifactId>apm-agent-api</artifactId> <version>1.13.0</version> </dependency> <dependency> <groupId>co.elastic.apm</groupId> <artifactId>apm-opentracing</artifactId> <version>1.13.0</version> </dependency></dependencies>
@xeraa
Manual Instrumentation
@GetMapping(value = "/products")@CaptureSpan("Annotation products span")Collection<ProductList> products() { ElasticApm.currentSpan().addTag("foo", "bar"); return productRepository.findAllList();}
@xeraa
Manual InstrumentationOpenTracing
@GetMapping("/products/{productId}")ProductDetail product(@PathVariable long productId) { final Span span = tracer.buildSpan("OpenTracing product span") .withTag("productId", Long.toString(productId)).start(); try (Scope scope = tracer.scopeManager().activate(span, false)) { return productRepository.getOneDetail(productId); } finally { span.finish(); }}
@xeraa
PS: Jaeger Intake
apm-server: jaeger: grpc: enabled: true host: "localhost:14250"
@xeraa
Transactions
Errors
Metrics
Data in a TraceJust another index
Elastic Common Schema (ECS)
@xeraa
Scale & Cleanup: ILM (modified){ "policy" : { "phases" : { "hot" : { "min_age" : "0ms", "actions" : { "rollover" : { "max_size" : "50gb", "max_age" : "30d" } } }, "delete" : { "min_age" : "30d", "actions" : { "delete" : { }
Overhead*: LatencyReport on background thread
Avoid data structures with lock contention
Avoid GC with an object pool* https://discuss.elastic.co/t/dec-15th-2018-en-apm-overhead-of-the-java-agent/160559
@xeraa
Lifecycle
https://github.com/elastic/apm-agent-java/blob/master/apm-agent-core/README.md#lifecycle
In single-threaded benchmarks, our Java agent imposes an overhead in the
order of single-digit microseconds (µs) up to the
99.99th percentile.@xeraa
Overhead: MemoryStatic overhead for object pools
and small buffersOrder of a couple of megabytes
@xeraa
When disabling header recording, the agent allocates
less than one byte for recording an HTTP request and
one JDBC (SQL) query, including reporting [...]@xeraa
"Our agent has zero overhead"
!
@xeraa
Conclusion
@xeraa
ToolsLogs
MetricsUptimeTraces
@xeraa
Distributed TracingWhy & how
@xeraa
Elastic APM & StackCutting edge, free, unified
@xeraa
Try Ithttps://github.com/elastic/
opbeans-java
@xeraa
Tracingfor Java Developers
Philipp Krenn@xeraa
@xeraa