Date post: | 19-Jan-2016 |
Category: |
Documents |
Upload: | kory-ellis |
View: | 241 times |
Download: | 0 times |
LAIO: Lazy Asynchronous LAIO: Lazy Asynchronous I/O For Event Driven I/O For Event Driven
ServersServers
Khaled ElmeleegyKhaled Elmeleegy
Alan L. CoxAlan L. Cox
OutlineOutline
Available I/O APIs and their Available I/O APIs and their shortcomings.shortcomings.
Event driven programming and its Event driven programming and its challenges.challenges.
Lazy Asynchronous I/O (LAIO).Lazy Asynchronous I/O (LAIO). Experiments and results.Experiments and results. Conclusions.Conclusions.
Key IdeaKey Idea
Existing I/O APIs come short of event Existing I/O APIs come short of event driven server needs.driven server needs.
LAIO fixes that.LAIO fixes that.
Non-Blocking I/ONon-Blocking I/O
System call may return without fully System call may return without fully completing the operation.completing the operation. Ex: write to a socket.Ex: write to a socket.
System call may also return with System call may also return with completion.completion.
Disadvantages:Disadvantages: Not available for disk operations.Not available for disk operations. Program using it needs to maintain state.Program using it needs to maintain state.
Asynchronous I/O (AIO)Asynchronous I/O (AIO)
System call returns immediately.System call returns immediately. Operation always runs to completion and Operation always runs to completion and
sends notification on completion.sends notification on completion. Via signal, event or polling.Via signal, event or polling.
DisadvantagesDisadvantages Missing disk operations like Missing disk operations like openopen and and stat.stat. Always receive completion via a notification Always receive completion via a notification
even if the operation didn’t block.even if the operation didn’t block. Lower performance.Lower performance.
Event Driven Programming Event Driven Programming with I/Owith I/O
event_loop(..){ … while(true) { event_list= get available events; for each event ev in event_list do
call handler of ev; }}
handler(…){ … /* do stuff 1 */ open(..); /*may block*/ … /* do stuff 2 */ return; /* to event_loop */}
(What we have)(What we have)
Event Driven Programming Event Driven Programming with I/Owith I/O
event_loop(..){ … while(true) { event_list= get available events; for each event ev in event_list do
call handler of ev; }}
handler(…){ … /* do stuff 1 */ open(..); /*may block*/ … /* do stuff 2 */ return; /* to event_loop */}
If BlocksServer Stalls
(What we have)(What we have)
Event Driven Programming Event Driven Programming with I/Owith I/O
event_loop(..){ … while(true) { event_list= get available events; for each event ev in event_list do
call event_handler of ev; }}
(What we want)(What we want)handler1(…){ … /* do stuff 1 */ open(..); /*may block*/ if open blocks { set handler2 as callback for open; return; /* to event_loop */ } … /* do stuff 2 */ return; /* to event_loop */}
Event Driven Programming Event Driven Programming with I/Owith I/O
event_loop(..){ … while(true) { event_list= get available events; for each event ev in event_list do
call event_handler of ev; }}
(What we want)(What we want)handler1(…){ … /* do stuff 1 */ open(..); /*may block*/ if open blocks { set handler2 as callback for open; return; /* to event_loop */ } … /* do stuff 2 */ return; /* to event_loop */}
Event Driven Programming Event Driven Programming with I/Owith I/O
event_loop(..){ … while(true) { event_list= get available events; for each event ev in event_list do
call event_handler of ev; }}
handler2(…){ … /* do stuff 2 */ return; /* to event_loop */}
(What we want)(What we want)handler1(…){ … /* do stuff 1 */ open(..); /*may block*/ if open blocks { set handler2 as callback for open; return; /* to event_loop */ } … /* do stuff 2 */ return; /* to event_loop */}
Lazy Asynchronous I/O Lazy Asynchronous I/O (LAIO)(LAIO)
Like AIO on blocking: asynchronous Like AIO on blocking: asynchronous completion notification.completion notification.
Also like AIO operations are done in one Also like AIO operations are done in one shot and no partial completions.shot and no partial completions.
Similar to non-blocking I/O if operations Similar to non-blocking I/O if operations completes without blocking.completes without blocking.
Scheduler activation based.Scheduler activation based. Scheduler activation is an upcall delivered Scheduler activation is an upcall delivered
by kernel when a thread blocks or unblocks.by kernel when a thread blocks or unblocks.
LAIO APILAIO API
int laio_syscallint laio_syscall
(int number,…)(int number,…)Performs the specified Performs the specified syscall syscall asynchronously.asynchronously.
void* laio_gethandlevoid* laio_gethandle
(void)(void)Returns a handle to Returns a handle to the last laio the last laio operation.operation.
laio_list laio_polllaio_list laio_poll
(void)(void)Returns a list of Returns a list of handles to completed handles to completed laio operations.laio operations.
Function NameFunction Name DescriptionDescription
laio_syscall(int number, …)laio_syscall(int number, …)
• Enable upcalls.• Save context• Invoke system call
System call
blocks?
• Disable upcalls• Return retval
• errno = EINPROGRESS• Return -1
upcall_handler(..){
.
.
.Steals old stack
using stored context}
No
Yes
Invoked via kernel upcall
Experiments and Experimental Experiments and Experimental setup.setup.
Performance evaluated using both micro-Performance evaluated using both micro-benchmarks and event driven web servers benchmarks and event driven web servers (thttpd and Flash).(thttpd and Flash).
Pentium Xeon 2.4 GZ with 2 GB RAM Pentium Xeon 2.4 GZ with 2 GB RAM machines.machines.
FreeBSD-5 with KSE, FreeBSD’s scheduler FreeBSD-5 with KSE, FreeBSD’s scheduler activation implementation.activation implementation.
Two web traces, Rice and Berkeley, with Two web traces, Rice and Berkeley, with working set sizes 1.1 GB and 6.4 GB working set sizes 1.1 GB and 6.4 GB respectively.respectively.
Micro-benchmarksMicro-benchmarks
Read a byte from a pipe 100,000 times Read a byte from a pipe 100,000 times two cases blocking and non-blocking:two cases blocking and non-blocking: For non-blocking (byte ready on pipe)For non-blocking (byte ready on pipe)
LAIO is 320% faster than AIO.LAIO is 320% faster than AIO. LAIO is 40% slower than non-blocking I/O.LAIO is 40% slower than non-blocking I/O.
For blocking (byte not ready on pipe)For blocking (byte not ready on pipe) AIO is 8% faster than LAIO.AIO is 8% faster than LAIO.
Call getpid(2) 1,000,000 times in two Call getpid(2) 1,000,000 times in two cases KSE enabled and disabled.cases KSE enabled and disabled. When disabled program was 5% faster (KSE When disabled program was 5% faster (KSE
overhead)overhead)
thttpd Experimentsthttpd Experiments
thttpd is an event driven server thttpd is an event driven server modified to use libevent an event modified to use libevent an event notification library.notification library.
Two versions of thttpd, libevent-Two versions of thttpd, libevent-thttpd and LAIO-thttpd.thttpd and LAIO-thttpd.
For LAIO-thttpd, thttpd was modified For LAIO-thttpd, thttpd was modified by breaking up event handlers by breaking up event handlers around blocking operations like open.around blocking operations like open.
thttpd Results thttpd Results (Berkeley Throughput)(Berkeley Throughput)
thttpd Results thttpd Results (Berkeley Response Time)(Berkeley Response Time)
thttpd Results thttpd Results (Rice Throughput)(Rice Throughput)
thttpd Results thttpd Results (Rice Response Time)(Rice Response Time)
thttpd Results thttpd Results (Rice Throughput 512 MB RAM)(Rice Throughput 512 MB RAM)
thttpd Results thttpd Results (Rice Response Time 512 MB (Rice Response Time 512 MB
RAM)RAM)
FlashFlash
An event driven web server.An event driven web server. 3 flavors:3 flavors:
Pure event driven.Pure event driven. AMPED: Asymmetric Multiprocess Event Driven. AMPED: Asymmetric Multiprocess Event Driven.
Event driven core.Event driven core. Potentially blocking I/O handed off to a helper process.Potentially blocking I/O handed off to a helper process. Helper does an explicit read to bring data in memory.Helper does an explicit read to bring data in memory.
LAIO: uses LAIO to do all I/O asynchronously.LAIO: uses LAIO to do all I/O asynchronously. For each of the three flavors files are sent For each of the three flavors files are sent
either with sendfile(2), or using mmap(2).either with sendfile(2), or using mmap(2).
Flash ExperimentsFlash Experiments
All experiments are done with 500 clients.All experiments are done with 500 clients. All sockets are blocking.All sockets are blocking. For mmap: File maped to memory, then For mmap: File maped to memory, then
written to socket. written to socket. Page faults may happen.Page faults may happen. mincore(2) is used to check if pages are in mincore(2) is used to check if pages are in
memory.memory. For sendfile: File is sent via the sendfile(2) For sendfile: File is sent via the sendfile(2)
syscall which may block.syscall which may block. Optimized sendfile: Kernel is modified that Optimized sendfile: Kernel is modified that
sendfile returns if blocking on disk occurs.sendfile returns if blocking on disk occurs.
Flash ThroughputFlash Throughput(mmap)(mmap)
Berkeley-Berkeley-ColdCold
81 Mbps81 Mbps 134 Mbps134 Mbps 132 Mbps132 Mbps
Berkeley-Berkeley-WarmWarm
78 Mbps78 Mbps 127 Mbps127 Mbps 131 Mbps131 Mbps
Rice-ColdRice-Cold 203 Mbps203 Mbps 386 Mbps386 Mbps 299 Mbps299 Mbps
Rice-WarmRice-Warm 830 Mbps830 Mbps 800 Mbps800 Mbps 797 Mbps797 Mbps
ConfigurationConfiguration Flash-event Flash-event (mmap)(mmap)
FLASH-FLASH-AMPED AMPED (mmap)(mmap)
FLASH-LAIO FLASH-LAIO (mmap)(mmap)
For Rice-Cold: 41072 callouts to the helper For Rice-Cold: 41072 callouts to the helper process for AMPED. For LAIO 46486 page faults.process for AMPED. For LAIO 46486 page faults. Performance difference is due to prefetching.Performance difference is due to prefetching.
Flash ThroughputFlash Throughput(sendfile)(sendfile)
Berkeley-Berkeley-ColdCold
122 Mbps122 Mbps 171 Mbps171 Mbps 171 Mbps171 Mbps
Berkeley-Berkeley-WarmWarm
125 Mbps125 Mbps 180 Mbps180 Mbps 179 Mbps179 Mbps
Rice-ColdRice-Cold 277 Mbps277 Mbps 398 Mbps398 Mbps 382 Mbps382 Mbps
Rice-WarmRice-Warm 845 Mbps845 Mbps 843 Mbps843 Mbps 815 Mbps815 Mbps
ConfigurationConfiguration Flash-event Flash-event (sendfile)(sendfile)
FLASH-FLASH-AMPED AMPED
(sendfile)(sendfile)
FLASH-LAIO FLASH-LAIO (sendfile)(sendfile)
ConclusionsConclusions
LAIO subdues shortcomings of other I/O LAIO subdues shortcomings of other I/O APIs.APIs.
LAIO is more than 3 times faster than LAIO is more than 3 times faster than AIO when data is in memory.AIO when data is in memory.
LAIO serves well event driven servers.LAIO serves well event driven servers. LAIO increased thttpd throughput by LAIO increased thttpd throughput by
38%.38%. LAIO matched Flash performance with LAIO matched Flash performance with
no kernel modifications.no kernel modifications.
Questions?Questions?