+ All Categories
Home > Technology > New sendfile

New sendfile

Date post: 19-Jul-2015
Category:
Upload: gleb-smirnoff
View: 92 times
Download: 1 times
Share this document with a friend
Popular Tags:
36
New sendfile(2) Gleb Smirnoff [email protected] FreeBSD Storage Summit Netflix 20 February 2015 Gleb Smirnoff [email protected] New sendfile(2) 20 February 2015 1 / 23
Transcript
Page 1: New sendfile

New sendfile(2)

Gleb [email protected]

FreeBSD Storage SummitNetflix

20 February 2015

Gleb Smirnoff [email protected] New sendfile(2) 20 February 2015 1 / 23

Page 2: New sendfile

History of sendfile(2) Before sendfile(2)

Miserable life w/o sendfile(2)

while ((cnt = read(filefd, buf, (u_int)blksize)) > 0 &&write(netfd, buf, cnt) == cnt)

byte_count += cnt;

send_data() в src/libexec/ftpd/ftpd.c,FreeBSD 1.0, 1993

Gleb Smirnoff [email protected] New sendfile(2) 20 February 2015 2 / 23

Page 3: New sendfile

History of sendfile(2) sendfile(2) introduced

sendfile(2) introduced

intsendfile(int fd, int s, off_t offset, size_t nbytes, .. );

1997: HP-UX 11.001998: FreeBSD 3.0 and Linux 2.2

Gleb Smirnoff [email protected] New sendfile(2) 20 February 2015 3 / 23

Page 4: New sendfile

History of sendfile(2) sendfile(2) in FreeBSD

sendfile(2) in FreeBSD

First implementation - mapping userland cycle to thekernel:

read(filefd) → VOP_READ(vnode)write(netfd) → sosend(socket)blksize → PAGE_SIZE

Further optimisations:2004: SF_NODISKIO flag

2006: inner cycle, working on sbspace() bytes2013: sending a shared memory descriptor data

Gleb Smirnoff [email protected] New sendfile(2) 20 February 2015 4 / 23

Page 5: New sendfile

History of sendfile(2) sendfile(2) in FreeBSD

sendfile(2) in FreeBSD

First implementation - mapping userland cycle to thekernel:

read(filefd) → VOP_READ(vnode)write(netfd) → sosend(socket)blksize → PAGE_SIZE

Further optimisations:2004: SF_NODISKIO flag

2006: inner cycle, working on sbspace() bytes2013: sending a shared memory descriptor data

Gleb Smirnoff [email protected] New sendfile(2) 20 February 2015 4 / 23

Page 6: New sendfile

History of sendfile(2) sendfile(2) in FreeBSD

sendfile(2) in FreeBSD

First implementation - mapping userland cycle to thekernel:

read(filefd) → VOP_READ(vnode)write(netfd) → sosend(socket)blksize → PAGE_SIZE

Further optimisations:2004: SF_NODISKIO flag2006: inner cycle, working on sbspace() bytes

2013: sending a shared memory descriptor data

Gleb Smirnoff [email protected] New sendfile(2) 20 February 2015 4 / 23

Page 7: New sendfile

History of sendfile(2) sendfile(2) in FreeBSD

sendfile(2) in FreeBSD

First implementation - mapping userland cycle to thekernel:

read(filefd) → VOP_READ(vnode)write(netfd) → sosend(socket)blksize → PAGE_SIZE

Further optimisations:2004: SF_NODISKIO flag2006: inner cycle, working on sbspace() bytes2013: sending a shared memory descriptor data

Gleb Smirnoff [email protected] New sendfile(2) 20 February 2015 4 / 23

Page 8: New sendfile

What’s not right with sendfile(2) blocking on I/O

Problem #1: blocking on I/O

Algorithm of a modern HTTP-server:1 Take yet another descriptor from kevent(2)2 Do write(2)/read(2)/sendfile(2) on it3 Go to 1

Bottleneck: any syscall time.

Gleb Smirnoff [email protected] New sendfile(2) 20 February 2015 5 / 23

Page 9: New sendfile

What’s not right with sendfile(2) blocking on I/O

Problem #1: blocking on I/O

Algorithm of a modern HTTP-server:1 Take yet another descriptor from kevent(2)2 Do write(2)/read(2)/sendfile(2) on it3 Go to 1

Bottleneck: any syscall time.

Gleb Smirnoff [email protected] New sendfile(2) 20 February 2015 5 / 23

Page 10: New sendfile

What’s not right with sendfile(2) blocking on I/O

Attempts to solve problem #1

Separate I/O contexts: processes, threadsApachenginx 2

SF_NODISKIO + aio_read(2)nginxVarnish

Gleb Smirnoff [email protected] New sendfile(2) 20 February 2015 6 / 23

Page 11: New sendfile

What’s not right with sendfile(2) blocking on I/O

Attempts to solve problem #1

Separate I/O contexts: processes, threadsApachenginx 2

SF_NODISKIO + aio_read(2)nginxVarnish

Gleb Smirnoff [email protected] New sendfile(2) 20 February 2015 6 / 23

Page 12: New sendfile

What’s not right with sendfile(2) blocking on I/O

More attempts . . .

aio_mlock(2) instead of aio_read(2)aio_sendfile(2) ???

Gleb Smirnoff [email protected] New sendfile(2) 20 February 2015 7 / 23

Page 13: New sendfile

What’s not right with sendfile(2) control over

Problem #2: control over VM

VOP_READ() leaves pages in VM cacheVOP_READ() [for UFS] does readahead

Not easy to prevent it doing that!

Gleb Smirnoff [email protected] New sendfile(2) 20 February 2015 8 / 23

Page 14: New sendfile

What’s not right with sendfile(2) control over

Problem #2: control over VM

VOP_READ() leaves pages in VM cacheVOP_READ() [for UFS] does readaheadNot easy to prevent it doing that!

Gleb Smirnoff [email protected] New sendfile(2) 20 February 2015 8 / 23

Page 15: New sendfile

New sendfile(2) implementation above pager

waht if VOP_GETPAGES()?

VOP_READ() → VOP_GETPAGES()

Pros:sendfile() already works on pagesimplementations for vnode and shmem convergecontrol over VM is now easier task

ConsLosing readahead heuristics /

But no one used them! ,

Gleb Smirnoff [email protected] New sendfile(2) 20 February 2015 9 / 23

Page 16: New sendfile

New sendfile(2) implementation above pager

waht if VOP_GETPAGES()?

VOP_READ() → VOP_GETPAGES()

Pros:sendfile() already works on pagesimplementations for vnode and shmem convergecontrol over VM is now easier task

ConsLosing readahead heuristics /

But no one used them! ,

Gleb Smirnoff [email protected] New sendfile(2) 20 February 2015 9 / 23

Page 17: New sendfile

New sendfile(2) implementation above pager

waht if VOP_GETPAGES()?

VOP_READ() → VOP_GETPAGES()

Pros:sendfile() already works on pagesimplementations for vnode and shmem convergecontrol over VM is now easier task

ConsLosing readahead heuristics /But no one used them! ,

Gleb Smirnoff [email protected] New sendfile(2) 20 February 2015 9 / 23

Page 18: New sendfile

New sendfile(2) VOP_GETPAGES_ASYNC()

VOP_GETPAGES_ASYNC()

intVOP_GETPAGES(struct vnode *vp, vm_page_t *ma,int count, int reqpage);

1 Initialize buf(9)2 buf->b_iodone = bdone;3 bstrategy(buf);4 bwait(buf); /* sleeps until I/O completes */5 return;

Gleb Smirnoff [email protected] New sendfile(2) 20 February 2015 10 / 23

Page 19: New sendfile

New sendfile(2) VOP_GETPAGES_ASYNC()

VOP_GETPAGES_ASYNC()

intVOP_GETPAGES_ASYNC(struct vnode *vp,vm_page_t *ma, int count, int reqpage,vop_getpages_iodone_t *iodone, void *arg);

1 Initialize buf(9)2 buf->b_iodone = vnode_pager_async_iodone;3 bstrategy(buf);4 return;

vnode_pager_async_iodone calls iodone() .Gleb Smirnoff [email protected] New sendfile(2) 20 February 2015 10 / 23

Page 20: New sendfile

New sendfile(2) non-blocking sendfile(2)

naive non-blocking sendfile(2)

In kern_sendfile():1 nios++;2 VOP_GETPAGES_ASYNC(sendfile_iodone);

In sendfile_iodone():1 nios--;2 if (nios) return;3 sosend();

Gleb Smirnoff [email protected] New sendfile(2) 20 February 2015 11 / 23

Page 21: New sendfile

New sendfile(2) non-blocking sendfile(2)

the problem of naive implementation

sendfile(filefd, sockfd, ..);write(sockfd, ..);

Gleb Smirnoff [email protected] New sendfile(2) 20 February 2015 12 / 23

Page 22: New sendfile

New sendfile(2) “not ready” data in socket buffers

socket buffer

mbuf mbuf mbuf mbuf mbuf mbuf

struct sockbufstruct mbuf *sb_mb

struct mbuf *sb_mbtail

u_int sb_cc

Gleb Smirnoff [email protected] New sendfile(2) 20 February 2015 13 / 23

Page 23: New sendfile

New sendfile(2) “not ready” data in socket buffers

socket buffer with “not ready” data

mbuf mbuf mbuf mbuf mbuf mbuf

page page

struct sockbufstruct mbuf *sb_mb

struct mbuf *sb_fnrdy

struct mbuf *sb_mbtail

u_int sb_acc

u_int sb_cccGleb Smirnoff [email protected] New sendfile(2) 20 February 2015 14 / 23

Page 24: New sendfile

New sendfile(2) final implementation

non-blocking sendfile(2)

In kern_sendfile():1 nios++;2 VOP_GETPAGES_ASYNC(sendfile_iodone);3 sosend(NOT_READY);

In sendfile_iodone():1 nios--;2 if (nios) return;3 soready();

Gleb Smirnoff [email protected] New sendfile(2) 20 February 2015 15 / 23

Page 25: New sendfile

New sendfile(2) comparison with old sendfile(2)

traffic

Gleb Smirnoff [email protected] New sendfile(2) 20 February 2015 16 / 23

Page 26: New sendfile

New sendfile(2) comparison with old sendfile(2)

CPU idle

Gleb Smirnoff [email protected] New sendfile(2) 20 February 2015 17 / 23

Page 27: New sendfile

New sendfile(2) comparison with old sendfile(2)

profiling sendfile(2) in head

aio_daemon 13.64%sys_sendfile 7.40%t4_intr 5.66%xpt_done 1.04%pagedaemon 4.16%scheduler 5.28%

Gleb Smirnoff [email protected] New sendfile(2) 20 February 2015 18 / 23

Page 28: New sendfile

New sendfile(2) comparison with old sendfile(2)

profiling new sendfile(2)

sys_sendfile 16.9%t4_intr 8.17%xpt_done 9.91%pagedaemon 6.54%scheduler 3.58%

Gleb Smirnoff [email protected] New sendfile(2) 20 February 2015 19 / 23

Page 29: New sendfile

New sendfile(2) comparison with old sendfile(2)

profiling new sendfile(2)

sys_sendfile 16.9% (vm_page_grab 9.24% !!)t4_intr 8.17% (tcp_output() 2.07% !!)xpt_done 9.91% (m_freem() 3.11% !!)pagedaemon 6.54%scheduler 3.58%

Gleb Smirnoff [email protected] New sendfile(2) 20 February 2015 19 / 23

Page 30: New sendfile

New sendfile(2) comparison with old sendfile(2)

what did change?

New code always sends full socket bufferWhich is good for TCP (as protocol)Which hurts VM, mbuf allocator,and unexpectedly TCP stack

Will fix that!

Gleb Smirnoff [email protected] New sendfile(2) 20 February 2015 20 / 23

Page 31: New sendfile

New sendfile(2) comparison with old sendfile(2)

what did change?

New code always sends full socket bufferWhich is good for TCP (as protocol)Which hurts VM, mbuf allocator,and unexpectedly TCP stack

Will fix that!

Gleb Smirnoff [email protected] New sendfile(2) 20 February 2015 20 / 23

Page 32: New sendfile

New sendfile(2) comparison with old sendfile(2)

old sendfile(2) @ Netflix

Gleb Smirnoff [email protected] New sendfile(2) 20 February 2015 21 / 23

Page 33: New sendfile

New sendfile(2) comparison with old sendfile(2)

new sendfile(2) @ Netflix

Gleb Smirnoff [email protected] New sendfile(2) 20 February 2015 21 / 23

Page 34: New sendfile

New sendfile(2) plans and problems

TODO list

Problems:VM & I/O overcommitZFSSCTP

Future plans:sendfile(2) doing TLS

Gleb Smirnoff [email protected] New sendfile(2) 20 February 2015 22 / 23

Page 35: New sendfile

New sendfile(2) plans and problems

TODO list

Problems:VM & I/O overcommitZFSSCTP

Future plans:sendfile(2) doing TLS

Gleb Smirnoff [email protected] New sendfile(2) 20 February 2015 22 / 23

Page 36: New sendfile

New sendfile(2)

Questions?

Gleb Smirnoff [email protected] New sendfile(2) 20 February 2015 23 / 23


Recommended