+ All Categories
Home > Technology > TX/RX 101: Transfer data efficiently

TX/RX 101: Transfer data efficiently

Date post: 01-Sep-2014
Category:
Upload: lourens-naude
View: 1,927 times
Download: 0 times
Share this document with a friend
Description:
he recent explosion in products supporting multiple devices require exposing data through lightweight services and data driven APIs. We're shifting data across flaky networks, massaging between different storage backends, encode / decode by protocol and frequently drive this from Cloud platforms with very variable Quality of Service and availability guarantees (if any). Yet most engineering teams aren't up to speed with POSIX and POSIX.1b(Realtime-extensions), the API's exposed by most modern production systems for I/O. In this talk I'll look at a few very often misunderstood concepts, as well as challenge others. * Understanding buffered I/O and how to sustain throughput for fast and slow clients alike * Blocking VS Non-blocking I/O * POSIX Lies - compliance isn't always as it should be, in some cases *worse* than CSS and other specifications "supported" by modern browsers * Asynchronous I/O myths and why I/O concurrency is often an illusion * Why and how Windows I/O Completion Ports knock the socks off most UNIX systems * I/O multiplexing gotchas * The Cloud: managing latency with variable network and storage characteristics * High throughput system calls
Popular Tags:
71
Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929 Tx/Rx 101: Transfer data efficiently Lourens Naudé SAPO Codebits 2011 Lisbon - http://codebits.eu Thursday, November 10, 11
Transcript
Page 1: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Tx/Rx 101: Transfer data efficiently

Lourens Naudé

SAPO Codebits 2011Lisbon - http://codebits.eu

Thursday, November 10, 11

Page 2: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Operations @ WildfireApp.com

Bio

Thursday, November 10, 11

A few months ago at WildfireApp.com I made the transition from a developer background and role to the operations team. With an Engineering mind it’s easy to get distracted by lower level details and loose sight of the bigger picture. During this time we’ve released several products and this changed my perspective on operational complexity, moving parts and the importance of getting and keeping stable software out in front in customers. Less is more. There will be some notes on this during today’s talk as well.

Page 3: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

All modern software should represent a connected

world.

Communication

Thursday, November 10, 11

Lots of trending products are actually platforms, spread across multiple devices. API and integration driven, like Twitter. Even this event exposes various APIs for projects to consume in the coming days. Software is social.

Page 4: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Most social problems are communication problems.

Systems aren’t any different

Communication

Thursday, November 10, 11

Two outstanding individuals can engage in a relationship, but it’s bound to fail if there’s little or no communication.

Page 5: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

All systems GO !

Thursday, November 10, 11

http://www.flickr.com/photos/23968709@N03/5452362745/

Page 6: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Efficiency

Performant individual requests ScalabilityOperational complexityMinimal / no developer on boardingCost of ownership (today and forward looking)

Thursday, November 10, 11

Different things to different people. It’s not just about performant individual requests, but also scaling those out to multiple clients. Operational complexity and moving parts should be kept to a minimum. Often overlooked, especially in startup environments, is on boarding costs of new developers, especially in the presence of lower level / funky protocols etc. that require a bootstrap period. Rule of thumb: your cowboy should get it. All of this directly affects the bottom line.

Page 7: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

To maintain acceptable response times for more clients,

with the same or less infrastructure

Efficient Scalability

Thursday, November 10, 11

This is representative of most public facing production deployments. Remember, not about the speed of a single request, but being able to easily handle increased throughput.

Page 8: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Agenda

System calls and file descriptorsBlocking, nonblocking and async I/OMultiplexed I/OThere’s a better way ...Optimizations

Thursday, November 10, 11

We’ll allocate time towards system calls, file descriptors - the canonical reference used by user space apps for I/O and how all of that fits into common I/O models: blocking, nonblocking and async. The next challenge is applying this knowledge at scale using multiplexed I/O, however apparently the world is coming to an end in 2012 and there’s much better alternatives. We’ll then phase out with a few common optimizations.

Page 9: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Head count ?

Thursday, November 10, 11

Anyone currently consuming or exposing APIs ? Messing around with or using any of these technologies in production ?

Page 10: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Mediates access to system resources :

I/O, memory and CPU

The kernel

Thursday, November 10, 11

Since the 60s, nothings really changed much with how we access system resources. First line of communication: between the Kernel and User space is a negotiation for resources.

Page 11: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Sandboxed execution for security and stability

requirements

User mode

Thursday, November 10, 11

The vast majority of processes, especially scripting languages, will spend most of their on CPU time in user mode. How does user processes access system resources ?

Page 12: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

System calls

Kernel

Ruby process

Ruby process

disk

read(5, &buf, 4000)

Thursday, November 10, 11

Ask the Kernel to do work on behalf of a user process.

Page 13: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

A protected function call from user to kernel space with the intent to interact with a system resource

Syscall definition

Thursday, November 10, 11

Page 14: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Characteristics

Uniform API on POSIX systemsDitto for behavior. LIES !Slow emulations

ssize_t read(int d, void *buf, size_t nbytes)

ssize_t write(int d, const void *buf, size_t nbytes)

Thursday, November 10, 11

It's also guaranteed to have a uniform API and behavior on POSIX compliant (read: most UNIX) systems. Just like HTML and CSS compliance in modern browsers, most operating system implementations do not conform 100% to spec. Configure checks at compile time may confirm the presence of a performant system call when in fact it’s just an emulation.

Page 15: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Syscall performance

MUCH slower than function calls20x and moreDefinite context switchCalls can block - slow upstream peers

Thursday, November 10, 11

An important performance characteristic of system calls is that they're an order of magnitude more expensive than eg. a libc functioncall in the same process. In other words, there's a context switch that affects several subsystems eg. swapping out CPU registers and memory references.

They may also block indefinitely or until a condition's met eg. a full buffer on a read syscall.

Page 16: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Who doesn't know what file descriptors or file handles

are ?

Thursday, November 10, 11

Page 17: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

fd = open(“/path/file”, O_READ)read(fd, &buf, 4000)

open syscall

Thursday, November 10, 11

OPEN system call to open a file. Give it a path and it returns a resource, which is used in subsequent READ and WRITE calls for I/O.

Page 18: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Examples

local filesocketdirectorypipeConsistent API, semantics differ

Thursday, November 10, 11

A file descriptor may represent a local file, a socket, a directory or a pipe etc. Semantics for these resources differ immensely, eventhough the canonical reference is a numeric handle and the basic read and write system call APIs are equivalent. Only ever a device to the kernel.

Page 19: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Definition

Numeric handleReferences a kernel allocated resourceKernel and user space buffer

Kernel buffer

User buffer

/path/serv.sock( fd 5 )

Thursday, November 10, 11

A file descriptor (or handle) is a numeric handle representing an I/O resource allocated by the Kernel, with metadata in the Kernel and buffers in both Kernel and User space. SIMPLE and EFFICIENT contract.

Page 20: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

A request to read a chunk of data from a descriptor may

block depending onbuffer state

Blocking I/O

Thursday, November 10, 11

File descriptors are blocking by DEFAULT. Easiest I/O model, but also unpredictable with the worst performance guarantees.

Page 21: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Blocked buffer state

Kernel buffer

User buffer2000 1000

read(5, &b, 4000)

Thursday, November 10, 11

Kernel puts the caller process in a waiting state. The application ( execution thread ) sleeps until the I/O operation has completed (or has generated an error) at which point it is scheduled to run again. Simple to use, but many downsides. Domino effect on connected peers.

Page 22: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

fd = socket (PF_INET, SOCK_STREAM, 0); fcntl (fd, F_SETFL, O_NONBLOCK)

Nonblocking I/O

Thursday, November 10, 11

The POSIX spec allows for changing this behavior by setting the O_NONBLOCK flag on a descriptor.

Page 23: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

A read request would only be initiated if the system

call won’t block

Nonblocking I/O

Thursday, November 10, 11

Page 24: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Kernel buffer

User buffer2000 1000

read(5, &b, 4000)

EAGAIN

Thursday, November 10, 11

In this scenario, we’ll partial read the 1000 bytes, but errno would be set to EAGAIN, indicating that we need to issue the read request again.

Lots of system call overhead – constantly making system calls to see if I/O is ready. Can be high latency if I/O arrives and a system call is not made for a while. Efficiency is very poor due to the constant polling with system calls from user space

Page 25: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Don’t drink the kool aid

Guideline onlyNot free - invokes a syscallNot supported by all devices

Thursday, November 10, 11

Do note that this behavior is thus just a guideline for the caller. We’re told to retry, just not when. It's not free either - there's the context switch as well as system call arguments validation in the Kernel as well.

There's a considerable grey area, as per POSIX spec, that ignores O_NONBLOCK semantics on any file handle that references a file on ablock device. This has an important performance impact on systems that access block devices with variable performance characteristics, like EBS @ Amazon AWS. All system calls don’t respect non-blocking I/O either ( libeio / libuv )

Page 26: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Nonblocking I/O IS NOT asynchronous I/O

Async I/O myth

Thursday, November 10, 11

File descriptors change state ( read / write ) asynchronously at the Kernel layer, we handle those events in User space. A NIC receives data which is buffered in the kernel. We still need to ask for and copy it. The takeaway, and what you MUST remember here is that this IS NOT asynchronous I/O.

An event occurs asynchronously, but is handled synchronously.

Page 27: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Async I/O

Callbacks and notificationsIssues I/O in parallelDirect I/O (no double buffering)Popular with database systemsTell the Kernel everything it needs to complete an I/O job

Thursday, November 10, 11

Helps maximize I/O throughput by allowing lots of I/O to issued in parallel. Network IO is frequently not supported.

Page 28: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Application Kernel

submit job

data, notify

"data"

callback()

fd : 10 op : read notify : signal buffer : 0x100431 size : 4

Thursday, November 10, 11

Page 29: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

POSIX Realtime Extension

aio_read, aio_write etc.Supports file I/O only

Thursday, November 10, 11

Page 30: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

case SIGEV_NONE:!!break;

case SIGEV_THREAD:!/* Unsupported [RTS] */

default:!return (EINVAL);

OS X Lion - AIO support

Thursday, November 10, 11

Preferred notification mechanism is through signal ( SIGIO ), but doesn’t support passing any metadata either. Thus if there’s 10 jobs submitted, we still need to figure out which of those jobs are in a completed state.

Page 31: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Windows I/O Completion Ports

Supports all descriptor typesVery mature and stable technology

Thursday, November 10, 11

Redmond has given us malware, but has done some things right as well.

Page 32: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Recap

O_NONBLOCK is a guideline onlyInvoked in user spaceData transfer in user spaceBlocking and nonblocking I/O terms

Thursday, November 10, 11

Moving forward, we shall prefer the terms blocking and non-blocking I/O as the work done transferring data from Kernel to User spacebuffers is still done in user space. Even if we do change descriptor behavior with the O_NONBLOCK flag, it's merely a guideline for thecaller to choose a strategy as the intent and additional context is known there.

Page 33: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Challenges of scale

Large number of file descriptorsRespond to descriptor state changesExecute domain logic

Thursday, November 10, 11

SINGLE FDs until now. Remember, the primary uses case is increased throughput. The main challenge here is handling a large number of file descriptors (representing client connections) efficiently in a single process. Previous examples represented single descriptors only.

Page 34: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Who's familiar with select, poll, epoll or kqueue ?

Multiplexed I/O

Thursday, November 10, 11

Page 35: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Multiplexed I/O

Nonblocking I/O is inefficientRetry on EAGAIN is pollingMultiplexed I/O: concurrent blockingNotified of state changes on any registered fd

Thursday, November 10, 11

Nonblocking I/O is inefficient - we continuously need to retry on EAGAIN error conditions and as such waste resources polling. Multiplexed I/O allows an application to concurrently block on a set of file descriptors (thousands with epoll and kqueue) and receive notification when any of them are in a readable or writable state. We aim to block on a set of descriptors, not just 1.

Page 36: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Multiplexed I/O

MultiplexerI/O bound

App

fd 5

User space Kernel space

fd 6fd 7

fd 8

fd 9fd 10fd 11

fd 12

Thursday, November 10, 11

At any given time a small subset of the descriptors registered with the multiplexer would be in a readable or writable state.

Page 37: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

1. Tell me when fds 1..200 are ready for a read or a

write

Multiplexed I/O

Thursday, November 10, 11

Page 38: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

2. Do other work until one of them's ready

Multiplexed I/O

Thursday, November 10, 11

Page 39: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

3. Get woken up by the multiplexer on state

changes

Multiplexed I/O

Thursday, November 10, 11

Page 40: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

4. Which of fd set 1..200 are in a ready state ?

Multiplexed I/O

Thursday, November 10, 11

Page 41: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

5. Handle all fds ready for I/O

Multiplexed I/O

Thursday, November 10, 11

Repeat. There will almost always just be a small subset of fds ready to drive work from. The more we have, the smaller this total : ready ratio generally becomes.

Page 42: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Questions ?

Checkpoint.

Thursday, November 10, 11

Page 43: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

ETOOCOMPLEX

Thursday, November 10, 11

You don’t need a UNIX beard in your department. And you should be building apps.

http://www.flickr.com/photos/taniwha/2237514055/

Page 44: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

There’s a better way

http://zeromq.org

Thursday, November 10, 11

Recently I’ve spent some free time on a new set of Ruby bindings for ZeroMQ - a communications and concurrency framework. This framework is going to change the way we do I/O moving forward - plans for Linux Kernel integration.

Page 45: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Scripting for sockets

Super Sockets

Thursday, November 10, 11

We spend copious amounts of time implementing business logic, refactoring and often don’t think of how individual components will communicate. Requirements also change over time - sockets should be “scriptable” as well. Not an out the box technology - provides the building blocks for solutions. Like legos. Especially important in service integration - connect with Facebook, Twitter etc. Social software. Crappy communication patterns with third party integrations is much more likely to affect your performance profiles than inefficient code paths deployed from within your organization.

Page 46: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

BSD sockets + Messaging patterns

REQ

REP PUSH

PUB

SUB SUB

PULL PULL

Thursday, November 10, 11

Maps to messaging patternsRequest-Reply, Publish-Subscribe, PipelinePrefers mailboxes to buffers and is asynchronous by design ( Actor Pattern )Build out topologies just in time.Allows us to focus on interfaces and patterns first

Page 47: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Transport agnostic

inproc (Threads)IPCTCP/IPMulticastUniform API

Thursday, November 10, 11

Scales to X cores: threads in a process, processes on a box, boxes on a networkSame API regardless of transport

Operations can deploy a single process on first service iteration, using threads to utilize cores. If the engineering team’s weak with multithreaded programming ( or the codebase is too complex ), can move to IPC transport, which requires managing more processes but allows developers to code for a single thread of execution. If and when capacity is required, a TCP transport allows for scaling to multiple boxes.

Page 48: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Brokerless

Thursday, November 10, 11

Broker is always a SPOFLess admin overhead to not rely on message broker.Any intermediate is slow - most brokers touch disk extensively for durabilitySometimes platforms are built around the capabilities of a particular broker implementationhttp://www.flickr.com/photos/epsos/5591761716/

Page 49: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

methodmissing:rbczmq lourens$ ruby perf/local_thr.rb tcp://127.0.0.1:5020 10 1000000message size: 10 [B]message count: 1000000mean throughput: 410276 [msg/s]mean throughput: 32.822 [Mb/s]

methodmissing:rbczmq lourens$ ruby perf/local_thr.rb tcp://127.0.0.1:5020 1024 1000000message size: 1024 [B]message count: 1000000mean throughput: 176342 [msg/s]mean throughput: 1444.595 [Mb/s]

methodmissing:rbczmq lourens$ ruby -vruby 1.9.2p290 (2011-07-09 revision 32553) [x86_64-darwin11.0.1]

FAST

Thursday, November 10, 11

Minimal latency - higher than TCP (which is normal), BUT higher throuhgput than TCP ( batching )Communications and concurrency framework - can load balance between threads and processes as wellSocket options allow for tuning individual sockets - not system wide as per raw sockets

Page 50: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Resilient

Auto-reconnectsDurable socketsSocket options: backlog, HWM, SWAPClients can come up before serversAtomic message delivery

Thursday, November 10, 11

Manage a socket’s ability to handle variable throughput and not drop messages with the backlog, HWM and SWAP socket options.Easy for upgrades if we don’t need to always coerce things in sequence.Fault tolerant message passing - atomic - you either get the message (blob), or you don’t

Page 51: TX/RX 101: Transfer data efficiently

REQ REQ REQ

ROUTER

DEALER

REP REP REP

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Interjection principle

Thursday, November 10, 11

Interjection principle states that inserting an intermediary node into the topology should not change the behavior at the endpoints.

DEALER / ROUTER pair allows for non-blocking request/response. Great for slow / fast clients alike.

Page 52: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

X Platform

Multiple OS’sMultiple languages - clients ecosystemNo message contracts - BLOBS

Thursday, November 10, 11

A good clients ecosystem is so often overlooked. There’s a Syslog module for easily tapping into event streams as needed as well.

Page 53: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

ZMQ efficiency

Brokerless (or roll your own)Easy to re-architectSwap transport to scale to more coresResiliencySpecialize as needed with socket options

Thursday, November 10, 11

Page 54: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Write data from multiple buffers to a single stream ...

Working with multiple buffers

b1

b2

b3

Thursday, November 10, 11

Page 55: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

... or read data from a single stream to multiple buffers.

b1

b2

b3

Thursday, November 10, 11

Page 56: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

ssize_t writev(int fildes, const struct iovec *iov, int iovcnt);

ssize_t readv(int fildes, const struct iovec *iov, int iovcnt);

Vectored I/O

Thursday, November 10, 11

Page 57: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Benefits

Atomic - reads / writes all or nothingReduces the amount of syscallsFixed sized headersDatabase recordsWell supported

Thursday, November 10, 11

Reduces the amount of system calls required to read / write data from non-contiguous buffers

Page 58: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Write example

b1

b2Thursday, November 10, 11

Page 59: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Call write for each buffer

b1

b2 write

write

Thursday, November 10, 11

Page 60: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Allocate buffer large enough for both buffers, copy data and write once

b1

b2b3

copy

copy

write

Thursday, November 10, 11

Page 61: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Call writev once to output both buffers

b1

b2writev

Thursday, November 10, 11

Page 62: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Higher latency than TCP, but also higher throughput.

ZeroMQ message batching

Thursday, November 10, 11

Page 63: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Sending two or more messages down the

networking stack is faster than individual messages.

Message batching

Thursday, November 10, 11

Page 64: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Characteristics

Opportunistic batchingNIC asks 0mq for messagesSends all available at any given timeLow latency for single VS multiple messages use case

Thursday, November 10, 11

Page 65: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Zero-Copy sends

Saves on memcpy()Delegate ownership of a buffer to omqScripting language stringsCallback function on sendTricky with Garbage Collection

Thursday, November 10, 11

Page 66: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

void free_msg(void *data, void *s);

zmq_msg_init_data(&m, RSTRING_PTR(s), RSTRING_LEN(s), free_msg, (void*)s);

rb_gc_register_address(&s);zmq_send(s, &m);

void free_msg(void *data, void *s){ rb_gc_unregister_address(&s);}

Thursday, November 10, 11

Page 67: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

"message"

Ruby ZeroMQ

"message" "message"

"message"

"message"

callback()

Thursday, November 10, 11

Message in Ruby VM VS message in ZeroMQ

Page 68: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Passing thoughts

System callsBlocking, nonblocking and async I/OMessaging patterns ftw!Efficiency is context dependentRead http://zguide.zeromq.orgReread http://zguide.zeromq.org

Thursday, November 10, 11

Page 69: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Questions ?

Thursday, November 10, 11

Page 70: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

Wildfire Interactive, Inc. is hiring

Thursday, November 10, 11

Page 71: TX/RX 101: Transfer data efficiently

Wildfire Interactive, Inc. | 1600 Seaport Boulevard, Suite 500, Redwood City, CA 94063 | (888) 274-0929

http://www.jobscore.com/jobs/wildfireapp

follow @methodmissingfork github.com/methodmissing

Thanks !

Thursday, November 10, 11


Recommended