Date post: | 13-Nov-2014 |
Category: |
Documents |
Upload: | tuxologynet |
View: | 35 times |
Download: | 9 times |
If you can't read please download the document
Splice,Tee&Vmsplice:ZeroCopyinLinuxUnabletohandlekernelpagingrequestatvirtualaddress4d1b65e8 Unabletohandlekernelpagingrequestatvirtualaddress4d1b65e8 pgd=c0280000 pgd=c0280000 [4d1b65e8]*pgd=00000000[4d1b65e8]*pgd=00000000 Internalerror:Oops:f5[#1] Internalerror:Oops:f5[#1] Moduleslinkedin:Moduleslinkedin:hx4700_udchx4700_udcasic3_baseasic3_base CPU:0 CPU:0 PCisatset_pxa_fb_info+0x2c/0x44 PCisatset_pxa_fb_info+0x2c/0x44 LRisathx4700_udc_init+0x1c/0x38[hx4700_udc] LRisathx4700_udc_init+0x1c/0x38[hx4700_udc] pc:[]lr:[]Nottainted Herzelinux sp:c076df78ip:60000093fp:c076df84 http://tuxology.net pc:[]lr:[]Nottainted
Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.
Forfullcopyrightinformationseelastpage. CreativeCommonsAttributionShareAlike2.0license
1
RightstocopyThiskitcontainsworkbytheAttributionShareAlike2.0 Youarefree tocopy,distribute,display,andperformthework tomakederivativeworks tomakecommercialuseofthework Underthefollowingconditions Attribution.Youmustgivetheoriginalauthorcredit. ShareAlike.Ifyoualter,transform,orbuilduponthiswork, youmaydistributetheresultingworkonlyunderalicense identicaltothisone. Foranyreuseordistribution,youmustmakecleartoothersthe licensetermsofthiswork. Anyoftheseconditionscanbewaivedifyougetpermissionfrom thecopyrightholder. Yourfairuseandotherrightsareinnowayaffectedbytheabove. Licensetext:http://creativecommons.org/licenses/bysa/2.0/legalcodeCopyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.
followingauthors: Copyright20042006 MichaelOpdenacker [email protected] http://www.freeelectrons.com Copyright20032006 OronPeled [email protected] http://www.actcom.co.il/~oron Copyright20042008 Codefidenceltd. [email protected] http://www.codefidence.com
Forfullcopyrightinformationseelastpage. CreativeCommonsAttributionShareAlike2.0license
2
KernelarchitectureApp1 App2 Clibrary Systemcallinterface Process management Memory management Filesystem support Filesystem types CPUsupport code CPU/MMU supportcode Storage drivers Character devicedrivers Network devicedrivers Hardware CPU RAM Storage Device control Networking ... User space
Kernel space
Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.
Forfullcopyrightinformationseelastpage. CreativeCommonsAttributionShareAlike2.0license
3
KernelModevs.UserModeAllmodernCPUssupportadualmodeofoperation: Usermode,forregulartasks. Supervisor(orprivileged)mode,forthekernel. ThemodetheCPUisindetermineswhichinstructionstheCPUis willingtoexecute: SensitiveinstructionswillnotbeexecutedwhentheCPUisin usermode. TheCPUmodeisdeterminedbyoneoftheCPUregisters,whichstores thecurrentRingLevel 0forsupervisormode,3forusermode,12unusedbyLinux.Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd. Forfullcopyrightinformationseelastpage. CreativeCommonsAttributionShareAlike2.0license
4
TheSystemCallInterfaceWhenauserspacetasksneedstouseakernelservice,itwillmakea SystemCall. TheClibraryplacesparametersandnumberofsystemcallinregisters andthenissuesaspecialtrapinstruction. Thetrapatomicallychangestheringleveltosupervisormodeandthe setstheinstructionpointertothekernel. Thekernelwillfindtherequiredsystemcalledviathesystemcalltable andexecuteit. Returningfromthesystemcalldoesnotrequireaspecialinstruction, sinceinsupervisormodetheringlevelcanbechangeddirectly.
Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.
Forfullcopyrightinformationseelastpage. CreativeCommonsAttributionShareAlike2.0license
5
LinuxSystemCallPathKernel do_name() sys_name() entry.S Function call Trap
Task
Glibc Task
Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.
Forfullcopyrightinformationseelastpage. CreativeCommonsAttributionShareAlike2.0license
6
ExchangingDataWithUserSpace(1)Inkernelcode,youcan'tjustmemcpybetween anaddresssuppliedbyuserspaceand theaddressofabufferinkernelspace! Correspondtocompletelydifferent addressspaces(thankstovirtualmemory). Theuserspaceaddressmaybeswappedouttodisk. Theuserspaceaddressmaybeinvalid (userspaceprocesstryingtoaccessunauthorizeddata).
Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.
Forfullcopyrightinformationseelastpage. CreativeCommonsAttributionShareAlike2.0license
7
ExchangingDataWithUserSpace(2)Youmustusededicatedfunctionssuchasthefollowingonesinyour readandwritefileoperationscode: include unsignedlongcopy_to_user(void__user*to, constvoid*from, unsignedlongn); unsignedlongcopy_from_user(void*to, constvoid__user*from, unsignedlongn); Makesurethatthesefunctionsreturn0! Anotherreturnvaluewouldmeanthattheyfailed.Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd. Forfullcopyrightinformationseelastpage. CreativeCommonsAttributionShareAlike2.0license
8
DMAOffLoadEngineDMA(DirectMemoryAccess)offloadengineisapieceof hardwarethatdoesmemcpybyhardwareotherthentheCPU.Example:IntelI/OAT(I/OAccelerationTechnology).
MakesthecopythejobofanentityotherthentheCPU. It'szerocopy,ifbycopyyoumeancopybytheCPU.
Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.
Forfullcopyrightinformationseelastpage. CreativeCommonsAttributionShareAlike2.0license
9
SimpleClient/ServerCopiesClientKernel
Server
Rx
Tx
Kernel
Copy to user ... ret = recv(s, buf) ... User space Application
Copy from user ... ret = send(s, buf) ... User space Application
Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.
Forfullcopyrightinformationseelastpage. CreativeCommonsAttributionShareAlike2.0license
10
SimpleClient/ServerCopiesClient Server
RxKernel
Tx
Kernel
Copy to user ... ret = recv(s, buf) ... User space Application
DMA
Copy from user DMA ... ret = read(s, buf) ... ret = send(s, buf) ... User space Application DMA
Disk
Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.
Forfullcopyrightinformationseelastpage. CreativeCommonsAttributionShareAlike2.0license
11
ZeroCopyInkernelbufferthattheuserhascontrolover. Thebufferisimplementedasasetofreferencecountedpointerswhich thekernelcopiesaroundwithoutactuallycopyingthedata. splice()movesdatato/fromthebufferfrom/toanarbitraryfiledescriptor tee()Movesdatato/fromonebuffertoanother vmsplice()doesthesamethansplice(),butinsteadofsplicingfromfdto fdassplice()does,itsplicesfromauseraddressrangeintoafile. Canbeusedanywherewhereaprocessneedstosendsomethingfrom oneendtoanother,butitdoesn'tneedtotouchorevenlookatthedata, justforwardit.Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd. Forfullcopyrightinformationseelastpage. CreativeCommonsAttributionShareAlike2.0license
12
ZeroCopyInkernelbufferthattheuserhascontrolover. Implementedasapipe. Thepipebufferisimplementedasasetofreferencecounted pointerswhichthekernelcopiesaroundwithoutactually copyingthedata. tee(),splice()andvmsplice()movedatafromuserprogramto thepipeandfromonepipetothenext,withoutcopying Usewhenaprocessneedstosendsomethingfromoneendto another,butdoesn'tneedtotouchorevenlookatthedata.Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd. Forfullcopyrightinformationseelastpage. CreativeCommonsAttributionShareAlike2.0license
13
Splicesplice(intfd_in,off_t*off_in,intfd_out,off_t*off_out, size_tlen,unsignedintflags); splice()movesdatato(from)thepipefrom(to)anarbitrary filedescriptor. sendfile()isnowinternallyimplementedassplice(). MustuseSPLICE_F_MOVEflagtoachievezerocopy,if possible:bufferref.countofzeroofwholepages. Otherflags:SPLICE_F_NONBLOCK,SPLICE_F_MORE whichworkslikeTCP_CORK.Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd. Forfullcopyrightinformationseelastpage. CreativeCommonsAttributionShareAlike2.0license
14
Teelongtee(intfd_in,intfd_out,size_tlen,unsignedint flags); tee()moves(read:copiesreferenceto)datato(from)one pipebuffertotheother. Sourcepipestillholdsthedata. OnlyusefulflagisSPLICE_F_NONBLOCK.
Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.
Forfullcopyrightinformationseelastpage. CreativeCommonsAttributionShareAlike2.0license
15
ZeroCopyofExample1Splice() *Only pointer is copied
User space
FilePointer to page cache page
Socket BufPointer to page as part of frag list
Kernel Memory
Data
Copy (using DMA)
Hardware
HD Controller
Network ChipForfullcopyrightinformationseelastpage. CreativeCommonsAttributionShareAlike2.0license
* In reality you have to do two splice calls: one from the file to an intermediate pipe and one from the pipe to the socket buffers.Copyright20062004,MichaelOpdenacker Copyright20032006,OronPeled Copyright20042006CodefidenceLtd.
16
TeeImplementedusingTee&Splice#define_GNU_SOURCE #include #include #include #include #include #include #include intmain(intargc,char*argv[]) { intfd; intlen,slen; assert(argc==2); fd=open(argv[1],O_WRONLY|O_CREAT|O_TRUNC,\ 0644); if(fd==1){ perror("open"); exit(EXIT_FAILURE); } do{ /* *teestdintostdout. */ len=tee(STDIN_FILENO,STDOUT_FILENO, INT_MAX,SPLICE_F_NONBLOCK); if(len0){ slen=splice(STDIN_FILENO,NULL,fd,NULL, len,SPLICE_F_MOVE); if(slen