Observable Node.js Applications at Scale
Yunong Xiao, Senior Node.js Engineer, UI Platform, Netflix
@yunongx, [email protected]
July 2015EnterpriseJS
Node.js @ Netflix
Production is War
What’s Wrong?
Increased Errors Increased Latency
Memory Leak
Software is Complex
Drunk Man Anti-Method
DATA
“It is a capital mistake to theorize before one has DATA. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.”
Sherlock Holmes
-A Scandal in Bohemia
Observability From the Ground Up
Hard Harder Insane
Development ProductionProduction w/
Direct Customer Impact
Dev === Prod
node-bunyan
Streaming JSON logging format
Streaming JSON
Unix Philosophy
grep(1), cut(1), wc(1), awk(1), perl(1), jq(1), json(1), daggr(1)
Query Logs in Real Time
$ cat http.log | grep audit| bunyan -0 --strict -c 'this.res.statusCode === 200' | json -ga
‘req.timers["fetch-api"]' req.url | awk '{print $1 " " substr($2,0,7)}' | daggr -k 2 -v 1 quantize
Illegible!
bunyan(1) CLI
{ "name": "demo", "hostname": "Yunongs-MacBook-Pro.local", "pid": 35919, "level": 30, "authInfo": { "username": "yunong", "password": "xxxxxx" }, "msg": "got request", "time": "2015-07-07T21:46:01.317Z", "v": 0 }
Features
• Lightweight API
• Log levels: trace, debug, info, warn, error, fatal
• Extensible Streams interface.
• Custom object rendering with serializers.
HTTP Request
• Make API requests
• Persist state to file system
• Query database
• Update caches
Async HTTP Request
• Make API requests
• Persist state to file system
• Query database
• Update caches
node-vasync
• Observable Async workflow
• https://github.com/davepacheco/node-vasync
async f() state
failed f()
successful f()
pending f()
finished f() # of errors
Example
cb() not invoked
Results
function three pending
How Do I See this in Prod?
• Logs
• Core Dumps
• REPL
• HTTP API
Observable Node.js HTTP
• Observability
• Metrics
• Bunyan integration
restify http://restify.com
http://github.com/restify/node-restify
Audit Logs
Audit Logsstatus code request ID (UUID)URL
request headers
response headers
individual handler timers
req latency
Too many logs!
Request Capture Stream
• Captures all log statements at trace level in memory.
• Dump all logs for a particular request only on error.
Scoped Child Loggers
Logging
• Native Bunyan integration
• Request capture stream
• Audit logs
• Scoped child loggers
What about Data Processing?
Examples
Find all logs for a specific request.
$ grep $uuid restify.log
Examples
Count the # of non-200 responses.
$ grep restify-audit restify.log | bunyan -c ‘this.res.statusCode !== 200` -0 | wc -l
Examples
Show all requests that took longer than 200ms.
$ grep restify-audit restify.log | bunyan -c ‘this.latency > 200` -0 | wc -l
Advanced ExampleRequest latency distribution by URL
restify + Bunyan• Streaming JSON.
• Processing using Unix tools is easy.
• Helpful tools:
• Unix: cut(1), wc(1), grep(1), awk(1), perl(1), json(1), jq(1)…
• JSON: https://github.com/trentm/json (npm install -g json)
• daggr: https://github.com/joyent/daggr (npm install -g daggr)
Increased Errors Increased Latency
restifyCore Contributors at Netflix
@stinkydofu, @mjr578, @yunongx
restifyInterested? Contribute to restify!
restify.com github.com/restify/node-restify
distributed processing
• elasticsearch • spark • hive
bunyanrestify
audit logs
req id req headers
req latency
handler latenciesURL
res codeErrors
res headers
Processing w/ Unix tools
• grep(1) • awk(1) • cut(1) • wc(1) • sort(1) • json(1) • bunyan(1) • …
req scoped logs application specific
context
req capture stream dumps all logs on error
DTrace*req latencyhandler
latencies
* Where available
Production Observability
Observable Toolkit• vasync: https://github.com/davepacheco/node-vasync
• Observable async operations
• bunyan: https://github.com/trentm/node-bunyan
• Streaming JSON logs
• restify: restify.com
• Observable REST applications
• Unix Tools
• Easily process JSON logs
One more thing…
DTrace
Thanks
• Questions? We’re hiring!
• @yunongx
• http://restify.com
July 2015EnterprisJS