NanoLog Performance• Achieves 60 Million logs/second at a median latency of 12.5ns• 99.998% tail latency is better than the median of competitors
Setup• Log Message Format: 58-byte string + time/severity context• 100M log messages measured back-to-back at the API level
NanoLog: A Nanosecond Scale Logging SystemStephen Yang and John Ousterhout
How is NanoLog so Fast?• Key Bottlenecks in other systems:
• Compute: Formatting is an expensive operation. To generate the message below (which has 7 parameters) takes over 850ns
• Output Bandwidth: On a 250MB/s disk, outputting the 129 bytemessage below takes at least 500ns
• How does NanoLog avoid these problems?• Compute: Defer formatting to an offline process and output binary
values at runtime• Output Bandwidth: Extract static data at compile-time
• In the log message below, the source file name, line number, function, severity level and user format string are known at compile time so they can be extracted. This leaves only the time and ratio at the end as dynamic values.
• Applying these two techniques, the 129 byte message below becomes just 16 bytes
Sample Log Message:1473057128.133777014 src/LogCleaner.cc:826 in TombstoneRatioBalancerNOTICE: Using tombstone ratio balancer with ratio = 0.400000
ProblemLow latency applications are becoming more popular in thedatacenter and error logging systems (Log4j2, boost, spdlog)
are unable to keep up
Root CausesThe two most expensive operations in a logging system are (a)
formatting the message and (b) outputting the message to disk.Together, this can take over 1350ns!
Solution: NanoLogNanoLog is logging system that exposes a printf-like API and is 10-100x faster than its competitors by shifting the work out of
the runtime. It defers formatting work to an offline process and reduces the amount of data logged by extracting static
information at compile time.
The NanoLog PipelineCompile-Time: To use NanoLog, sources run through the NanoLog preprocessor which separates the dynamic and static log components into the Runtime and Decompressor/Aggregator components respectively.
Runtime: User code invoking NANO_LOG will interact with the Runtime library to output a compacted log containing only the dynamic values
Decompressor/Aggregator: After/During execution, the user can run the Decompressor/Aggregator to transform the compacted log to a human readable log.
Preprocessor ComponentPrimary Responsibilities:• Extract the static information embedded in the user log invocations
and save them to the NanoLog Library (as C++ source)
• Replace user NANO_LOG() invocations with optimized code only record the dynamic information (i.e. time, id, params) at runtime
Runtime ComponentPrimary Responsibilities:• Ensure low NANO_LOG() caller overhead by using per-thread buffering
queues and logging minimal information (see preprocessor)
• Maintain a background thread to poll through the per thread buffers
• Use rudimentary compaction techniques (finding smallest container & using deltas) to save on IO without compromising compute time.
Application ExecutableUser
Thread n
User Thread
n
User Thread nGenerated Code
NanoLog RuntimeCompact Log
[1 bytes Header][1-4 byte Unique Id][1-8 byte Time diff][0-4 bytes size][0-n bytes params]....
Per-Thread Staging Buffer
Unique id (4 bytes)
Byte Size (4 bytes)
Time (8 bytes)
args (n bytes)
....