+ All Categories
Home > Documents > Splice, Tee & VMsplice: zero copy in Linux

Splice, Tee & VMsplice: zero copy in Linux

Date post: 13-Nov-2014
Category:
Upload: tuxologynet
View: 35 times
Download: 9 times
Share this document with a friend
Description:
Slides for a lecture about using the splice, tee and vmsplice system call in Linux to achieve zero copy transfer of data for high performance applications
Popular Tags:

If you can't read please download the document

Transcript

Splice,Tee&Vmsplice:ZeroCopyinLinuxUnabletohandlekernelpagingrequestatvirtualaddress4d1b65e8 Unabletohandlekernelpagingrequestatvirtualaddress4d1b65e8 pgd=c0280000 pgd=c0280000 [4d1b65e8]*pgd=00000000[4d1b65e8]*pgd=00000000 Internalerror:Oops:f5[#1] Internalerror:Oops:f5[#1] Moduleslinkedin:Moduleslinkedin:hx4700_udchx4700_udcasic3_baseasic3_base CPU:0 CPU:0 PCisatset_pxa_fb_info+0x2c/0x44 PCisatset_pxa_fb_info+0x2c/0x44 LRisathx4700_udc_init+0x1c/0x38[hx4700_udc] LRisathx4700_udc_init+0x1c/0x38[hx4700_udc] pc:[]lr:[]Nottainted Herzelinux sp:c076df78ip:60000093fp:c076df84 http://tuxology.net pc:[]lr:[]Nottainted

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Forfullcopyrightinformationseelastpage. CreativeCommonsAttributionShareAlike2.0license

1

RightstocopyThiskitcontainsworkbytheAttributionShareAlike2.0 Youarefree tocopy,distribute,display,andperformthework tomakederivativeworks tomakecommercialuseofthework Underthefollowingconditions Attribution.Youmustgivetheoriginalauthorcredit. ShareAlike.Ifyoualter,transform,orbuilduponthiswork, youmaydistributetheresultingworkonlyunderalicense identicaltothisone. Foranyreuseordistribution,youmustmakecleartoothersthe licensetermsofthiswork. Anyoftheseconditionscanbewaivedifyougetpermissionfrom thecopyrightholder. Yourfairuseandotherrightsareinnowayaffectedbytheabove. Licensetext:http://creativecommons.org/licenses/bysa/2.0/legalcodeCopyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

followingauthors: Copyright20042006 MichaelOpdenacker [email protected] http://www.freeelectrons.com Copyright20032006 OronPeled [email protected] http://www.actcom.co.il/~oron Copyright20042008 Codefidenceltd. [email protected] http://www.codefidence.com

Forfullcopyrightinformationseelastpage. CreativeCommonsAttributionShareAlike2.0license

2

KernelarchitectureApp1 App2 Clibrary Systemcallinterface Process management Memory management Filesystem support Filesystem types CPUsupport code CPU/MMU supportcode Storage drivers Character devicedrivers Network devicedrivers Hardware CPU RAM Storage Device control Networking ... User space

Kernel space

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Forfullcopyrightinformationseelastpage. CreativeCommonsAttributionShareAlike2.0license

3

KernelModevs.UserModeAllmodernCPUssupportadualmodeofoperation: Usermode,forregulartasks. Supervisor(orprivileged)mode,forthekernel. ThemodetheCPUisindetermineswhichinstructionstheCPUis willingtoexecute: SensitiveinstructionswillnotbeexecutedwhentheCPUisin usermode. TheCPUmodeisdeterminedbyoneoftheCPUregisters,whichstores thecurrentRingLevel 0forsupervisormode,3forusermode,12unusedbyLinux.Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd. Forfullcopyrightinformationseelastpage. CreativeCommonsAttributionShareAlike2.0license

4

TheSystemCallInterfaceWhenauserspacetasksneedstouseakernelservice,itwillmakea SystemCall. TheClibraryplacesparametersandnumberofsystemcallinregisters andthenissuesaspecialtrapinstruction. Thetrapatomicallychangestheringleveltosupervisormodeandthe setstheinstructionpointertothekernel. Thekernelwillfindtherequiredsystemcalledviathesystemcalltable andexecuteit. Returningfromthesystemcalldoesnotrequireaspecialinstruction, sinceinsupervisormodetheringlevelcanbechangeddirectly.

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Forfullcopyrightinformationseelastpage. CreativeCommonsAttributionShareAlike2.0license

5

LinuxSystemCallPathKernel do_name() sys_name() entry.S Function call Trap

Task

Glibc Task

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Forfullcopyrightinformationseelastpage. CreativeCommonsAttributionShareAlike2.0license

6

ExchangingDataWithUserSpace(1)Inkernelcode,youcan'tjustmemcpybetween anaddresssuppliedbyuserspaceand theaddressofabufferinkernelspace! Correspondtocompletelydifferent addressspaces(thankstovirtualmemory). Theuserspaceaddressmaybeswappedouttodisk. Theuserspaceaddressmaybeinvalid (userspaceprocesstryingtoaccessunauthorizeddata).

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Forfullcopyrightinformationseelastpage. CreativeCommonsAttributionShareAlike2.0license

7

ExchangingDataWithUserSpace(2)Youmustusededicatedfunctionssuchasthefollowingonesinyour readandwritefileoperationscode: include unsignedlongcopy_to_user(void__user*to, constvoid*from, unsignedlongn); unsignedlongcopy_from_user(void*to, constvoid__user*from, unsignedlongn); Makesurethatthesefunctionsreturn0! Anotherreturnvaluewouldmeanthattheyfailed.Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd. Forfullcopyrightinformationseelastpage. CreativeCommonsAttributionShareAlike2.0license

8

DMAOffLoadEngineDMA(DirectMemoryAccess)offloadengineisapieceof hardwarethatdoesmemcpybyhardwareotherthentheCPU.Example:IntelI/OAT(I/OAccelerationTechnology).

MakesthecopythejobofanentityotherthentheCPU. It'szerocopy,ifbycopyyoumeancopybytheCPU.

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Forfullcopyrightinformationseelastpage. CreativeCommonsAttributionShareAlike2.0license

9

SimpleClient/ServerCopiesClientKernel

Server

Rx

Tx

Kernel

Copy to user ... ret = recv(s, buf) ... User space Application

Copy from user ... ret = send(s, buf) ... User space Application

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Forfullcopyrightinformationseelastpage. CreativeCommonsAttributionShareAlike2.0license

10

SimpleClient/ServerCopiesClient Server

RxKernel

Tx

Kernel

Copy to user ... ret = recv(s, buf) ... User space Application

DMA

Copy from user DMA ... ret = read(s, buf) ... ret = send(s, buf) ... User space Application DMA

Disk

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Forfullcopyrightinformationseelastpage. CreativeCommonsAttributionShareAlike2.0license

11

ZeroCopyInkernelbufferthattheuserhascontrolover. Thebufferisimplementedasasetofreferencecountedpointerswhich thekernelcopiesaroundwithoutactuallycopyingthedata. splice()movesdatato/fromthebufferfrom/toanarbitraryfiledescriptor tee()Movesdatato/fromonebuffertoanother vmsplice()doesthesamethansplice(),butinsteadofsplicingfromfdto fdassplice()does,itsplicesfromauseraddressrangeintoafile. Canbeusedanywherewhereaprocessneedstosendsomethingfrom oneendtoanother,butitdoesn'tneedtotouchorevenlookatthedata, justforwardit.Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd. Forfullcopyrightinformationseelastpage. CreativeCommonsAttributionShareAlike2.0license

12

ZeroCopyInkernelbufferthattheuserhascontrolover. Implementedasapipe. Thepipebufferisimplementedasasetofreferencecounted pointerswhichthekernelcopiesaroundwithoutactually copyingthedata. tee(),splice()andvmsplice()movedatafromuserprogramto thepipeandfromonepipetothenext,withoutcopying Usewhenaprocessneedstosendsomethingfromoneendto another,butdoesn'tneedtotouchorevenlookatthedata.Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd. Forfullcopyrightinformationseelastpage. CreativeCommonsAttributionShareAlike2.0license

13

Splicesplice(intfd_in,off_t*off_in,intfd_out,off_t*off_out, size_tlen,unsignedintflags); splice()movesdatato(from)thepipefrom(to)anarbitrary filedescriptor. sendfile()isnowinternallyimplementedassplice(). MustuseSPLICE_F_MOVEflagtoachievezerocopy,if possible:bufferref.countofzeroofwholepages. Otherflags:SPLICE_F_NONBLOCK,SPLICE_F_MORE whichworkslikeTCP_CORK.Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd. Forfullcopyrightinformationseelastpage. CreativeCommonsAttributionShareAlike2.0license

14

Teelongtee(intfd_in,intfd_out,size_tlen,unsignedint flags); tee()moves(read:copiesreferenceto)datato(from)one pipebuffertotheother. Sourcepipestillholdsthedata. OnlyusefulflagisSPLICE_F_NONBLOCK.

Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

Forfullcopyrightinformationseelastpage. CreativeCommonsAttributionShareAlike2.0license

15

ZeroCopyofExample1Splice() *Only pointer is copied

User space

FilePointer to page cache page

Socket BufPointer to page as part of frag list

Kernel Memory

Data

Copy (using DMA)

Hardware

HD Controller

Network ChipForfullcopyrightinformationseelastpage. CreativeCommonsAttributionShareAlike2.0license

* In reality you have to do two splice calls: one from the file to an intermediate pipe and one from the pipe to the socket buffers.Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.

16

TeeImplementedusingTee&Splice#define_GNU_SOURCE #include #include #include #include #include #include #include intmain(intargc,char*argv[]) { intfd; intlen,slen; assert(argc==2); fd=open(argv[1],O_WRONLY|O_CREAT|O_TRUNC,\ 0644); if(fd==1){ perror("open"); exit(EXIT_FAILURE); } do{ /* *teestdintostdout. */ len=tee(STDIN_FILENO,STDOUT_FILENO, INT_MAX,SPLICE_F_NONBLOCK); if(len0){ slen=splice(STDIN_FILENO,NULL,fd,NULL, len,SPLICE_F_MOVE); if(slen


Recommended