Date post: | 06-May-2015 |
Category: |
Technology |
Upload: | bcantrill |
View: | 12,749 times |
Download: | 0 times |
node.js in production:Reflections on three years of riding the unicorn
SVP, Engineering
Bryan Cantrill
@bcantrill
Tuesday, December 3, 13
Production systems
• Production systems are ones doing real work: when they misbehave, users or other systems are affected
• Production systems value reliability, performance and ease of deployment — usually in that order
• Contrast to development systems, that value ease of development and speed of development — in that order
• These values can be in tension: new languages and environments typically arise for their development values, not their production ones
• Would node.js be any different?
Tuesday, December 3, 13
node.js advantages
• In terms of production suitability, node.js had — and still has — a couple of major advantages going for it:
• It’s not a new language
• It leverages extant (Unix) abstractions
• It’s built on a VM (V8) that itself was designed for performance
• Its pure event-oriented model aligns ease of programming with scalability with respect to load
• As the stewards of both node and SmartOS, Joyent had another advantage: we could change, improve or leverage SmartOS to accommodate node in production
Tuesday, December 3, 13
node.js challenges
• But node.js also has a couple of major challenges:
• Single-threaded execution of JavaScript means that compute-bound code can entirely impede progress
• JavaScript closures make it easy to accidentally reference memory
• Because node.js is often used to connect backend components, failure to propagate back pressure can induce memory explosion and death
• High performance VM also implies inscrutable core dumps and very limited instrumentation
Tuesday, December 3, 13
August 2010: DTrace in node.js
• Added simple user-level statically defined tracing (USDT) probes for node.js on platforms that support DTrace (e.g., Mac OS X, SmartOS)
• Probes were around connection establishment, serving HTTP requests, etc.
• Allowed questions to be dynamically asked of running, production node.js servers, e.g.: dtrace -n ‘node*:::http-server-request{ printf(“%s of %s from %s\n”, args[0]->method, args[0]->url, args[1]->remoteAddress)}‘
dtrace -n http-server-request’{ @[args[1]->remoteAddress] = count()}‘
dtrace -n gc-start’{self->ts = timestamp}’ \ -n gc-done’/self->ts/{@ = quantize(timestamp - self->ts)}’
Tuesday, December 3, 13
August 2010: Deploying 0.2.x
• In August 2010, we deployed our first node.js-based service into production: a NodeKnockout leader-board that used node.js DTrace probes to geolocate connections to contestants in real-time
• Results were promising; surprisingly easy to develop and deploy a node.js based service — and service consumed very little CPU
• Watching the Node Knockout contestants in production revealed they were all light on CPU:
• But there was a storm cloud...
Tuesday, December 3, 13
August 2010: Deploying 0.2.x, cont.
• We had a memory leak that resulted in heap exhaustion after several hours under heavy load
• Our service was stateless and load balanced for HA, so this was more disconcerting than debilitating...
• ...but we also had quite a few contestants that would run their RSS up and crash; there was clearly a larger issue:
Tuesday, December 3, 13
February 2011: 0.4.0
• In February 2011, we deployed our first major node.js-based service (on 0.4.0)
• Service was able to be built remarkably quickly — but with some pain-points around Connect
• Despite being potentially a compute-bound service, CPU consumption was (again) a non-issue
• And with an updated node (and many fixed node leaks), memory consumption wasn’t necessarily as acute...
• …but we hit our first “spinning black hole” problem
Tuesday, December 3, 13
January 2011: node-dtrace-provider
• Our DTrace probes in node were proving to be too low-level for higher-level services — we needed to allow USDT probes to be expressed in JavaScript
• Fortunately, DTrace community member Chris Andrews extended his libusdt to node.js, allowed statically defined probes in JavaScript, e.g.: var dtp = d.createDTraceProvider(‘foo’); var probe = dtp.addProbe(‘foo-start’);
probe.fire(function(p) { return ([ { bar: 123, baz: ‘bar’ } ]); });
Tuesday, December 3, 13
April 2011: Restify
• Based on our experiences with Connect/Express, we wanted to build a node module that was purpose-built to implement HTTP-based API endpoints
• Based on Chris Andrews’ work, we wanted to have first class support for DTrace
• Joyent’s Mark Cavage developed node-restify, which quickly became the foundation for all of our services
• Built-in DTrace support allows full observability into per-route/per-handler latency — a capability that we could not live without at this point
Tuesday, December 3, 13
November 2011: MDB support for V8
• In mid-2011, Joyent’s Dave Pacheco dared to dream the impossible dream: full postmortem support for V8 for MDB, the debugger native to SmartOS
• Several unspeakable layer violations, mdb_v8 brought postmortem debugging to node.js
• ::jsstack prints full stack including both native C++ frames and JavaScript frames
• ::jsprint prints JavaScript objects — from the dump
• Thanks to mdb_v8, we were able to go back to a core dump from that infinite loop in our service deployed several months earlier — and nail it
Tuesday, December 3, 13
December 2011: DTrace ustack helper
• mdb_v8 was actually a way station to an even bolder dream: a DTrace ustack helper for node.js
• A ustack helper is a bit of code that accompanies a binary and assists DTrace in probe context to resolve stack frames to their higher-level names
• Once completed, allows user-level stack traces to be associated with in-kernel events — like profiling events
• Can use the DTrace profile provider to determine how a node.js program is consuming CPU via stack sampling
Tuesday, December 3, 13
December 2011: Flame graphs
• Pouring through stack traces can make hot functions difficult to visualize
• Joyent’s Brendan Gregg developed flame graphs, which allow us to easily visualize thousands of sampled stacks:
Tuesday, December 3, 13
January 2012: Bunyan
• Logging was becoming more and more of a problem for us — especially as we were developing distributed systems in node.js
• Joyent’s Trent Mick developed node-bunyan, a simple and fast JSON logging library for node.js
• Provides standardized, JSON, line-based log output that can be easily processed with JSON tools, e.g.:{"name":"moray","hostname":"d1cfb6c7-c975-4ed8-a689-fb18f94b6bfc","pid":8393,"component":"manatee","path":"/manatee/sdc/election","level":20,"db":{"available":2,"max":15,"size":2,"waiting":0},"options":{"async":false,"read":true},"msg":"pg: entered","time":"2013-12-03T02:54:24.565Z","v":0}
• Also includes command line tool, bunyan, for displaying Bunyan logs
Tuesday, December 3, 13
February 2012: npm shrinkwrap
• npm allows for fine-grained semver control over package dependencies, but we found that nested dependencies could result in non-replicable installs
• “npm shrinkwrap” generates a file that shrinkwraps all nested dependencies into npm-shrinkwrap.json, thereby locking down all nested versions
• Guarantees that all installs will have same semver versions of dependencies
• Doesn’t necessarily guarantee identical installs, however; for this, one needs private npm repositories
Tuesday, December 3, 13
April 2012: node-vasync
• There are a number of modules that deal with some of the mechanics of asynchronous control flow…
• But we found that libraries that handle We found we needed one that emphasized debugging, and in particular,
• node-vasync captures a number of popular flow patterns and allows state to be inspected via MDB
Tuesday, December 3, 13
May 2012: ::findjsobjects
• Building on Dave Pacheco’s mdb_v8, we implemented a debugger command that iterates over all of memory in a core dump, looking for JavaScript objects
• Entirely brute force, but allows one to take a swing at a nasty node.js issue: semantic memory leaks> ::findjsobjects OBJECT #OBJECTS #PROPS CONSTRUCTOR: PROPS95709ac1 195 3 Object: socket, type, handle957093f9 66 9 Object: uid, windowsVerbatimArguments, stdio, …95f13181 130 5 <anonymous> (as exports.StringDecoder): …8432ff55 222 3 Buffer: length, offset, parent843304dd 91 9 Object: refreservation, creation, name, type, …8432cc55 99 9 Object: time, msg, level, hostname, pid, action, …95f08545 66 14 ChildProcess: _closesNeeded, stdio, …8432f2e1 546 2 Array9570cafd 47 24 Object: <sliced string>, <sliced string>, …8432be95 415 3 Array8432fb09 67 19 Socket: errorEmitted, _bytesDispatched, …
Tuesday, December 3, 13
May 2012: ::findjsobjects -p
• Searching by property name allows one to find particular objects in the JavaScript heap, e.g.: > ::findjsobjects -p ip4addr | ::findjsobjects | ::jsprint -a8432b109: { ip4addr: 9aee115d: "10.88.88.200", VLAN: 9aee1199: "0", Host Interface: 9aee1185: "e1000g0", Link Status: 9aee1175: "up", MAC Address: 9aee113d: "02:08:20:47:93:82",}…
• While designed for postmortem debugging, this allows mdb_v8 to be used for in situ debugging in development
• Also guides one to a best practice: towards unique property names (which we have historically done in the operating system via structure prefixing)
Tuesday, December 3, 13
July 2012: node-fast
• While HTTP makes it very easy to put together a distributed system, parsing and connection management can become prohibitively expensive
• In building Manta, we found that we needed something lighter/faster; Joyent’s Mark Cavage built node-fast
• Only what you need: fully async/duplex/persistent connections, simple on-wire protocol (JSON), etc.
• None of what you don’t want: no IDL madness, no object model, no binary translation madness, etc.
• Deliberately light and limited — HTTP is still the right answer until it isn’t
Tuesday, December 3, 13
October 2012: Bunyan + DTrace
• With all of our services using Bunyan, we could enable dynamic logging by adding DTrace USDT probes
• Can use the raw DTrace probes:# dtrace -qn log-debug'{printf("%s\n", copyinstr(arg0))}' -x strsize=8k {"name":"wf-moray-backend","hostname":"414ffb35-adee-47b7-bdf4-d21cb039386c","pid":10952,"component":"MorayClient","host":"10.99.99.17","port":2020,"req_id":"bddb180f-1770-edcf-8df2-b3a81d97e9b1","level":20,"bucket":"wf_runners","key":"414ffb35-adee-47b7-bdf4-d21cb039386c","value":{"active_at":"2013-12-03T07:22:25.125Z","idle":false},"msg":"putObject: entered","time":"2013-12-03T07:22:25.135Z","v":0}...
• Added the json() subroutine to DTrace to make this easier to process
• Can also use “bunyan -p” and avoid the lower-level DTrace details entirely
Tuesday, December 3, 13
May 2013: --abort-on-uncaught-exception
• Crash dumps are great — but aborting after an uncaught exception makes it very difficult to determine the true origin of the exception
• Dave Pacheco implemented a V8 patch to induce a process abort (and a core dump) on an uncaught exception
• This allows us to use postmortem debugging to debug our everyday logic errors
• Available starting in 0.10.x — we use it wherever we have it!
Tuesday, December 3, 13
July 2013: Thoth
• One of the most important systems we have built in node is Manta, our object store featuring in situ compute
• Manta is an excellent platform for building data-based services — especially for large data objects
• We built manta-thoth, a platform for core and crash dump analysis that allows us to debug core dumps without moving them
• Thoth has become critically important for us to track and automatically debug production node.js services
Tuesday, December 3, 13
December 2013: Dump analysis on Linux
• Postmortem debugging has been a (the) tremendous breakthrough for node.js in production…
• ...but despite all node’s postmortem support all being open source, it has been limited to SmartOS
• Some have toyed with porting MDB to Linux; this is in principle possible, but will be rough sledding
• Joyent’s TJ Fontaine (of node core fame) observed what we had done with dump analysis on Manta and had a simpler idea…
• What about making Linux dumps consumable on SmartOS — and therefore Manta?
Tuesday, December 3, 13
December 2013: Linux support in libproc
• Over the course of a multiday engineering hackathon, TJ and Joyent’s Max Brunning added support for Linux crash dumps in SmartOS’s libproc
• Fortunately, because of the way the postmortem work was done by Dave Pacheco, it Just Works
• Do this yourself:https://gist.github.com/tjfontaine/de104fe058300a51f7cf
• For Linux users: put your Linux dumps to Manta, and you can finally debug those pesky leaks and crashes!
• Use --abort-on-uncaught-exception and you can use Manta and postmortem debugging to debug more quotidian programming errors!
Tuesday, December 3, 13
Node.js in production!
• For us at Joyent, the tooling that we have built into node.js has resulted in what we believe to be the best dynamic environment for production use
• Yes, even when compared to much older platforms like Java and Erlang...
• There is still work to be done, especially around add-on development (see TJ’s shim work!) and potentially better bundling of objects…
• We will continue to emphasize production deployment and use in our stewardship of node.js!
Tuesday, December 3, 13
Thank you
• @dapsays, the Patron Saint of node.js in production, for DTrace support, MDB support, node-vasync, Manta, etc.
• @mcavage for node-restify, node-fast, Manta, etc.
• @trentmick for node-bunyan
• @chrisandrews for node-dtrace-provider
• @brendangregg for flame graphs
• @tjfontaine for bringing postmortem debugging to an entirely new audience with Linux support for libproc!
Tuesday, December 3, 13