+ All Categories
Home > Documents > NanoLog: A Nanosecond Scale Logging Systemforum.stanford.edu/events/posterslides/NanoLogA... ·...

NanoLog: A Nanosecond Scale Logging Systemforum.stanford.edu/events/posterslides/NanoLogA... ·...

Date post: 09-Jul-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
1
NanoLog Performance Achieves 60 Million logs/second at a median latency of 12.5ns 99.998% tail latency is better than the median of competitors Setup Log Message Format: 58-byte string + time/severity context 100M log messages measured back-to-back at the API level NanoLog: A Nanosecond Scale Logging System Stephen Yang and John Ousterhout How is NanoLog so Fast? Key Bottlenecks in other systems: Compute : Formatting is an expensive operation. To generate the message below (which has 7 parameters) takes over 850ns Output Bandwidth : On a 250MB/s disk, outputting the 129 byte message below takes at least 500ns How does NanoLog avoid these problems? Compute : Defer formatting to an offline process and output binary values at runtime Output Bandwidth : Extract static data at compile-time In the log message below, the source file name, line number , function, severity level and user format string are known at compile time so they can be extracted. This leaves only the time and ratio at the end as dynamic values. Applying these two techniques, the 129 byte message below becomes just 16 bytes Sample Log Message: 1473057128.133777014 src/LogCleaner.cc:826 in TombstoneRatioBalancer NOTICE: Using tombstone ratio balancer with ratio = 0.400000 Problem Low latency applications are becoming more popular in the datacenter and error logging systems (Log4j2, boost, spdlog) are unable to keep up Root Causes The two most expensive operations in a logging system are (a) formatting the message and (b) outputting the message to disk. Together, this can take over 1350ns! Solution: NanoLog NanoLog is logging system that exposes a printf-like API and is 10-100x faster than its competitors by shifting the work out of the runtime. It defers formatting work to an offline process and reduces the amount of data logged by extracting static information at compile time. The NanoLog Pipeline Compile-Time: To use NanoLog, sources run through the NanoLog preprocessor which separates the dynamic and static log components into the Runtime and Decompressor/Aggregator components respectively. Runtime: User code invoking NANO_LOG will interact with the Runtime library to output a compacted log containing only the dynamic values Decompressor/Aggregator: After/During execution, the user can run the Decompressor/Aggregator to transform the compacted log to a human readable log. Preprocessor Component Primary Responsibilities : Extract the static information embedded in the user log invocations and save them to the NanoLog Library (as C++ source) Replace user NANO_LOG() invocations with optimized code only record the dynamic information (i.e. time, id, params) at runtime Runtime Component Primary Responsibilities: Ensure low NANO_LOG() caller overhead by using per-thread buffering queues and logging minimal information (see preprocessor) Maintain a background thread to poll through the per thread buffers Use rudimentary compaction techniques (finding smallest container & using deltas) to save on IO without compromising compute time. Application Executable User Thread n User Thread n User Thread n Generated Code NanoLog Runtime Compact Log [1 bytes Header] [1-4 byte Unique Id] [1-8 byte Time diff] [0-4 bytes size] [0-n bytes params] .... Per-Thread Staging Buffer Unique id (4 bytes) Byte Size (4 bytes) Time (8 bytes) args (n bytes) ....
Transcript
Page 1: NanoLog: A Nanosecond Scale Logging Systemforum.stanford.edu/events/posterslides/NanoLogA... · NanoLog Performance • Achieves 60 Million logs/second at a median latency of 12.5ns

NanoLog Performance• Achieves 60 Million logs/second at a median latency of 12.5ns• 99.998% tail latency is better than the median of competitors

Setup• Log Message Format: 58-byte string + time/severity context• 100M log messages measured back-to-back at the API level

NanoLog: A Nanosecond Scale Logging SystemStephen Yang and John Ousterhout

How is NanoLog so Fast?• Key Bottlenecks in other systems:

• Compute: Formatting is an expensive operation. To generate the message below (which has 7 parameters) takes over 850ns

• Output Bandwidth: On a 250MB/s disk, outputting the 129 bytemessage below takes at least 500ns

• How does NanoLog avoid these problems?• Compute: Defer formatting to an offline process and output binary

values at runtime• Output Bandwidth: Extract static data at compile-time

• In the log message below, the source file name, line number, function, severity level and user format string are known at compile time so they can be extracted. This leaves only the time and ratio at the end as dynamic values.

• Applying these two techniques, the 129 byte message below becomes just 16 bytes

Sample Log Message:1473057128.133777014 src/LogCleaner.cc:826 in TombstoneRatioBalancerNOTICE: Using tombstone ratio balancer with ratio = 0.400000

ProblemLow latency applications are becoming more popular in thedatacenter and error logging systems (Log4j2, boost, spdlog)

are unable to keep up

Root CausesThe two most expensive operations in a logging system are (a)

formatting the message and (b) outputting the message to disk.Together, this can take over 1350ns!

Solution: NanoLogNanoLog is logging system that exposes a printf-like API and is 10-100x faster than its competitors by shifting the work out of

the runtime. It defers formatting work to an offline process and reduces the amount of data logged by extracting static

information at compile time.

The NanoLog PipelineCompile-Time: To use NanoLog, sources run through the NanoLog preprocessor which separates the dynamic and static log components into the Runtime and Decompressor/Aggregator components respectively.

Runtime: User code invoking NANO_LOG will interact with the Runtime library to output a compacted log containing only the dynamic values

Decompressor/Aggregator: After/During execution, the user can run the Decompressor/Aggregator to transform the compacted log to a human readable log.

Preprocessor ComponentPrimary Responsibilities:• Extract the static information embedded in the user log invocations

and save them to the NanoLog Library (as C++ source)

• Replace user NANO_LOG() invocations with optimized code only record the dynamic information (i.e. time, id, params) at runtime

Runtime ComponentPrimary Responsibilities:• Ensure low NANO_LOG() caller overhead by using per-thread buffering

queues and logging minimal information (see preprocessor)

• Maintain a background thread to poll through the per thread buffers

• Use rudimentary compaction techniques (finding smallest container & using deltas) to save on IO without compromising compute time.

Application ExecutableUser

Thread n

User Thread

n

User Thread nGenerated Code

NanoLog RuntimeCompact Log

[1 bytes Header][1-4 byte Unique Id][1-8 byte Time diff][0-4 bytes size][0-n bytes params]....

Per-Thread Staging Buffer

Unique id (4 bytes)

Byte Size (4 bytes)

Time (8 bytes)

args (n bytes)

....

Recommended