Virtual 16 Bit Precise Operations on RGBA8 Textures

Virtual 16Bit PreciseOperationson RGBA8 Textures

R. Strzodka

NumericalAnalysisandScientificComputing

Universityof Duisburg, Germany

Email: [email protected]

Abstract

Thereis a growing demandfor high precisiontex-ture formatsfed by the increasingnumber of tex-tures per pixel and multi-passalgorithms in dy-namic texturing andvisualization. Thereforesup-port for wider data formats in graphicshardwareis evolving. The existing functionality of currentgraphicscards,however, canalreadybeusedto pro-vide higher precisiontextures. This papershowshow to emulatea16bit precisesignedformatby useof RGBA8 texturesandexistingshaderandregisteroperations.Therebya16bit numberis storedin twounsigned8 bit color channels. The focuslies on a16 bit signednumberformatwhich generalizesex-isting 8 bit formatsallowing losslessformatexpan-sions,andwhich hasan exact representationof 1,0 and � �

allowing stablelong-lastingdynamic tex-ture updates.Implementationsof basicarithmeticoperationsand dependent texture loop-ups in thisformatarepresentedandexamplealgorithmsdeal-ing with 16bit precisedynamicupdatesof displace-mentmaps,normaltexturesandfilters demonstratesomeof theresultingapplicationareas.

1 Intr oduction

Theprogrammability of graphicshardwareandthesetof availableoperationshasbeengrowing rapidlyin recentyears.Researchesmake useof this situa-tion either by extendinghardware algorithmsanddesigning new applicationswhich still are sup-ported by new graphicsfeaturesor by analysingsoftwarebasedgraphicspackagesandextractingorsimplifying parts, such that thesecan be mappedon the new graphics hardware functionality. Bothcaseshave in commonthat a growing numberof

complex multi-passalgorithmstry to make thebestpossibleuseof the availableresourcesandthe ap-plicationsarebecomingcomputationally moreandmoredemanding.

Even when the set of available operationswasmorerestrictedvariousalgorithmsweredesignedtoexploit graphics featuresfor computations[1, 4, 6].With thewideravailability of extensionslikemulti-texturing and pixel textures the area of applica-tions widenedstrongly reachingfrom lighting andshadingcomputations[7, 8, 12, 13] to variousim-ageprocessing applications[5, 9, 10, 11, 20] andadvanced hardware acceleratedshadinglanguages[14, 15]. The author himself has implementedcomplicatednumericalschemessolving parabolicdifferential equationsfully in graphics hardware[16, 17].

In all these applications dealing with multi-passesandmultipletexturesthereis aconcernaboutthe inevitably occurringerror due to the low reso-lution of the color channels which usually consistof only 8 bits. Especiallywhendealingwith highdynamic rangeimages[2, 18] several color chan-nelsareusedtogetherto handlethis problem.NewHILO texture formats introducedby NVIDIA [3]alsomeanto provide higherprecisiontexturing inparticular for lighting computations. Thesesolu-tions,however, arerestrictedto certainapplications,sothatthey cannot beusedfor arbitraryhigh preci-sion computationsor visualizations.We intend toovercomethis difficulty by demonstratingthat theexisting extensionsalreadyallow the introductionof a composite16 bit format on RGBA8 textures,on which the sameoperations ason the low preci-sion formatscanbe applied. The ideabehindthisemulation,however, is not as much to provide acompletesupportfor 64 bit rendering,as this will

VMV 2002 Erlangen,Germany, November20–22, 2002

be coveredby future graphicshardwaremoreeffi-ciently, but ratherto allow an easyandbandwidthefficient concurrent usageof 8 and16 bit renderingsuchthat the higher precisionformat can be usedto quickly resolve partialaccuracy problemson to-days8 bit architectures.Similarly, in the presenceof a native 16 bit formatthis approachwould allowto emulatea32 bit format.

The techniquesof implementinghigh-precisionarithmetic from low-precision building-blocks assuchareelaborateandhave beenusedin countlessarchitectures.Theproblemof doingthis in graphicshardware,however, lies in thevery restrictedavail-ability of conditionalstatementson a per fragmentbasis.The extensionsof pixel shadersandregistercombinersavailableon a GeForce3seemedto of-fer thehighestflexibility for this purposeandhavebeenusedfor the implementations.Theoreticallythe alpha-testandsomekind of dependent textureaccess,would alsosuffice to obtainthesamefunc-tionality, however, the performance would sufferheavily dueto numerous passes.But asthe imple-mentationsof theemulatedoperations areindepen-dentanddo not all rely on the samegraphicsfea-tures,someof themmayalsobeefficiently realizedin a morerestrictedsetting.

Wewill first review thedifferentnumber formatsunderOpenGLandexplain thechoiceof acompos-ite 16 bit format. The following main sectionwillthen presentthe implementationof the arithmeticanddependent texture operationsanddescribeex-ampleusage.

2 Number Formats

In this sectionwe discussthe differentfixed-pointnumberrepresentationsin OpenGLandwhich con-ditions would be desirablefor a new signed16 bitnumberformat.

Currently standardOpenGLknows only an un-signed8 bit fixed-point format, but the growingarithmeticwithin thetextureenvironmentstendsto-wardsa signed9 bit formatasusedby the registercombiners.RecentlyNVIDIA introduceda signed8 bit format anda signedandunsigned 16 bit for-mat. Unfortunatelyswitching from a lower to ahigherprecisionformatdoesnot alwaysimply thatall numbersin the lower precisionformat can beexactly representedin the higher one. This prob-lem is not specific to the OpenGL setting, but a

generaldifficulty in defining fixed-pointrepresen-tations.Thesituationis evenmoreconfusingastheunsigned formatsmay be re-interpretedas signednumbersby the mapping ��

, which isavailable at somestagesin the graphicspipeline.Table1 givesanoverview of thedifferentformats.

We seethat theunsigned8 bit format representsa subsetof the unsigned16 bit format andthat thesigned8 bit representsa subsetof thesigned16 bitformat, so that thesepairsof formatscanbe usedtogethereffectively. But the representednumbersfrom the signedandunsigned formatshave almostnothing in common, so that conversions betweenthesewould inevitably lead to loss of precision,which would prohibit suchconversionsin accumu-latingtextureupdates.Thereforewewill requirethenew compositeformat to bea superset of all of thelower precisionformats. Naturally theother16 bitformatscannotbegeneralizedby a formatwith thesameresolution.

The bestgeneralizationso far of signedandun-signedformatsis given by the signed9 bit formatwhich is a supersetof both the unsigned 8 bit andthenormalexpansionof theunsigned8 bit format.Moreover, the signed9 bit format has the advan-tagethe it exactly representsthe neutralelementsof addition andmultiplication

�and its divisors� �

, which is very important for long-lastingdy-namictextureupdates.If, for example,we dynam-ically change a textureevery otherframeusingad-ditionsandmultiplications,but wantsomeareasofthe texture to remainunchanged for sometime oreven throughout the process,we must rely on theexact representationof 0 and 1 or elsewe wouldhave to storethe informationaboutevery region tobeprotectedsomewhereandusesomesortof frag-menttestto leavethemunchanged.By generalizingthe singed9 bit format asrequiredabove, we willautomaticallytransferthis propertyof exact repre-sentationof ��

to thecompositeformat.

Additionally we would want the first 8 bit colorchannel of thecomposite 16 bit numberrepresenta-tion to be - on its own - the bestpossibleapprox-imation of the signed16 bit number. This wouldallow us to useonly the first part in caseswherethe full precisionis not requiredor difficult to use.Finally we shouldchoosea format which requiresonly few operations to performthecarry-overarith-meticnecessaryfor 16 bit operationsperformedonsigned8 bit multipliers andadders. Thus,we may

666

Table1: Comparisonof number formatsin OpenGL.

unsigned8 bit

unsigned16 bit

unsigned 8bit�� signed9 bit

signed8 bit

signed16 bit

formula �� !�" � � � ��#" � �� #��$ ��%� �#��$&� ��'("rangeof ) * +&, ��.- * +&, '(��/(�.- * +&, ��.- * �0��' , ��.- * ��#��$ , �#��!.- * �1/(��!�'�$ , /(��!�'(!.-

summarizetheconditionsfor thedesiredsigned16bit formatasfollows:2 Thecompositenumberformatshouldbea su-

persetof all lower precisionformatsfrom Ta-ble 1.2 Its first 8 bit channel shouldbethebestpossi-ble approximation to the whole signed16 bitnumber.2 The format should allow an efficient imple-mentationof thecarry-over arithmetic.

A format which fulfills theseconditions can bedefinedon a RGBA8 texture in the following way.We let 3546��798 representthe first signed 16 bitnumberand 3;:6�(<68 thesecond.Therepresentationof afixed-pointnumberis givenby:= 3?>@��)A8CB D ��FE 3;��> � ��G�G@8�H ��#��$ 3?) � � �@I�8�J

D �� #��$ 3;�@G@K�3?> � � �@IL8MHN)A8O 3546��7P8QB D E ��4 � � J H ��#��$RE 7 � �� JD E ��4 � ��'�� JSHUT�#��$WV

Thefirst rowsdefinethecorrespondencefor integer> and ) , andthesecondfor fixedpoint 4XDY>�Z��@G�Gand 7[D\)]Z@��G@G (1 corresponds to 255 and

��to 128). From the first formula we seethat ourcompositeformat generalizesthe lower precisionformatsfrom Table1, asit producesall numeratorsfrom � ��G@G_^ � �@I to HM��G�GQ^ � �@I for the commondenominator��G�G`^ � �@I . Moreover, by definitionthemappingof thefirst color channelR correspondstothe normalexpansionof an unsigned 8 bit formatgiving thebestpossibleapproximation to thewhole16 bit number. Thus the format fulfills the firsttwo conditionsgiven above. The satisfaction ofthe third conditionwill becomeclear in Section3wherewewill presenttheexactimplementationsofarithmeticoperations.

At theendof this sectionwe shouldlook at pos-sible drawbacksof this format. The problemwithfixed-pointnumbers having anexact representationof �a� � � � ��

is an unavoidable representationof

numbers outsideof the range b � � � � c . In caseofthe signed9 bit format this is � ��'�� and in ourcaseall numberswith >NDd��G@Ga�()fe � ��I and>QDga�()ih � ��I . It would bevery unpleasant hav-ing to definean external format with a resolutiondepend numberrange.This problemis well knownandhasinfluencedthe decisionagainstsuchexter-nalformatsasdiscussedin NVIDIA’stextureshaderspecification[3]. But as in the caseof the signed9 bit format additional clampingoperationswouldsolve themain disadvantageof over-representationand would gain smoothertransitionsbetweentheexisting formats.

3 Operations

In this sectionwe will presenthow the basicarith-metic operationsaddition, subtraction,multiplica-tion, division and dependent texture look-upscanbe realizedat 16 bit precisionwith our compositenumber formatin RGBA8 textures.Shortexampleswill demonstratepossibleusesof theoperations.

While the predefined16 bit formats can onlybe usedin few specialoperationsand their valuesmust be uploaded from main memory, the opera-tionsfrom this sectionwill exhibit themainadvan-tageof the new composite format by allowing dy-namicchangesto thetheoperands.

All occurringtextureswill be two dimensional.They will usually contain the first 16 bit channelin the colors 3546�(7P8 and the secondin 3;:6��<j8 ,where 46�(:6�(<Q��7[klb �� c representfixed-pointvalues in 8 bit. This choice will becomeclearin Section 3.3. Textures will be seen as two-dimensional four-valuedmappings: mnB]3?��oA8p� � �mqb 4M:P<F7 c 3?��oA8 . The necessaryoperationsin thepixel shadersandregistercombiners will be givenin pseudo-code notation, where ’ � � � ’ means ismapped to , ’ � ’ meansis stored in , ’ 2 ’ denotesthe dot-product and’ ^ ’ the component-wisemulti-plication.

666

3.1 DependentTextureLook-Ups

Let r be any texture which should be accessedvia a dynamically generateddisplacement map mwith the16bit x-displacement in 354F�(7P8 andthe16bit y-displacementin 3;:6�s<68 . Then a quick 8 bitprecisetextureoffset rF3?�tHiuv^�mqb 4 c ��oSHiuv^�mqb : c 8canberealizedin pixel shadersthrough:

w 3?xyH{z`^�|ib } c ��~_H�z�^�|�b � c 80: tex2d 3?��o1�(��8p� � ��mq3?��oA81: dot2dblueto

�� D�3;��u��(a�(� � u�8 2 35mqb 4 c ��mqb : c � � 8D��FH�u�^L3;��mqb 4 c � � 82: lut2dblueto

��1� D�35��(��u��(o � u�8 2 35mqb 4 c �smqb : c � � 8D�oPH�u`^�3;�@mqb : c � � 8� � �frF3 � � � � � 8

where u is a userset scaling factor, which deter-minesthemaximalpossibleoffset.

Example: Brownian motion. Let � be a vectorfield with random x-componentsin 3546��798 , randomy-componentsin 3;:6�(<j8 andthe wrap modesettorepeat,and m adisplacement mapasabove initiallysetto zero.Thenthefollowing shortalgorithmwillproducea randomlocal motionin thetexture r :

> � D random3;� �;' 8 �> � D random3;� �;' 8 ��XD��Q3?�FH�> � ^��yb 4 c ��oMH�> � ^��yb : c 8 �mXH`D��j^��j�4�D�rF3?�FH{mqb 4 c ��oMH{mqb : c 8 �where� steersthespeedof themotionand 4 holdsthe resulting texture in eachstep. Although thelook-up itself takesplacein only 8 bit, the motioncanvaryacrossthetexturewith 16bit sincethedis-placementm is storedandcalculatedin 16 bit. Theimplementationof 16 bit preciseadditionandmul-tiplication is shown in the following subsections.In particular, the usermay vary � within a biggerrange,without having to fear that themotion stopsaltogether, becausethemultiplicationwith � evalu-atesto zero.

As the predefined texture offset operationsrequire a DSDT or HILO input format wewould currently need two separatetextures forx-displacement m � b :P< c

and y-displacementm � b :P< cwith m � b 4 c

and m � b 4 cset to one for a

dynamic16 bit preciselook-upin pixel shaders:

w 3?xyH{z�^�|_�0b �� c � ~yH�z�^�|_��b �� c 80: tex2d 3?��o1��L8�� m � 3?��.oA81: tex2d 3?��o1��L8�� m � 3?��oA82: dot2d � � D�3?�FH�u ��'�� s�@u�� u�Z � ��IL82 3 � �sm � b : c �sm � b < c 8D��6H�u`^ O 35m � b : c �sm � b < c 83: lut2d �W� Dg3?oPH�u ��'�� s�@uL�(u�Z � ��IL82 3 � �sm � b : c �sm � b < c 8D�oPH�u�^ O 35m � b : c �sm � b < c 8

� � ��rF3 � � � � � 8where u again scalesthe offset. Here we couldalso obtain an absoluteand not an offset-texturelook-up by eliminatingthe coordinates� and o inthe pixel shaders2 and 3. Then m � �(m � wouldaddress r absolutely, i.e. the result would berF35m � b :P< c �(m � b :P< c 8 . Sucha constructioncanbeusedto evaluatean arbitrary function � of two 16bit variables��3?�q��j8�DYrF3?��b :P< c ��_b :P< c 8 . Theprecisionof this evaluation corresponds directly tothe sizeof the texture r which holdsthe resultingvalues.In particularwe couldimplementa divisionbetween two 16 bit numbers in this way, butnaturallytheresultingrangewould still beconfinedto thesameinterval for all pixel valuesin animage,i.e. division by small numbers really requiresfloating-pointformats.

We shouldalsoemphasizethat the above useofthe dot-product 2d operation,where the first part(shader2) accessesa different previous texture,namely m � , thanthesecondm � , is uncommonbutlegitimate.

Example: Advection. Let � � b :P< cbe the x-

component and � � b :P< cthe y-component of a

continuous vectorfield � . Alik e let m � b :P< candm � b :P< c

be thex andy-componentsof a displace-mentmapinitially setto zero. Thenthe followingshort algorithm will producean advection of thetexture r alongthevectorfield � :

m � H`D��^�� 3?� � �Wm � b :P< c ��o � �]m � b :P< c 8 �m � H`D��j^�� 3?� � �Wm � b :P< c ��o � �Wm � b :P< c 8 �4�D�rF3?� � m � b :P< c �(o � m � b :P< c 8 �where � againsteersthe speedof the motion and4 holdstheresultingadvectedtexturein eachstep.If we replaced� � by m � and � � by m � we wouldobtaina self-advectionof m � �(m � , which is a steptowardsfluid dynamics. In this mannerwe couldsimulate the motion of gas or water if we also

666

forced 35m � ��m � 8 to be divergencefree asrequiredby the incompressible Navier-Stokes-Equations,for moredetailswe referto [19].

3.2 Addition and Subtraction

Let the texture ��1 hold a pair of 16 bit pre-cise x and y coordinatesin 3 ��1�b 4 c ��1�b 7 c 8and 3 ��1Ab : c ��WAb < c 8 respectively, and thetexture �� anotherpair in 3 �� b 4 c �� b 7 c 8 ,3 �� b : c �� b < c 8 . For thex andy coordinatelet3;u � �su � 8�k� � � � �� ¡ � � � �� encodeindepen-dently whetheran addition (

�set)or a subtraction

( � �set)shouldbeperformed.Thenthetwo simul-

taneous16 bit preciseadditionsor subtractionsofthe x and y coordinatescan be mappedonto theregistercombiner functionality:¢%£ x�¤�b }_¥¦�&�� c H§35z � ��z � 8R^ ¢&£ xp¨1b }Q¥©�&�_� c0: RGB 3 ��1�b 4P:P< c � �� 8ªH«3;u � �su � �(u � 8�^^t3 �� b 4M:P< c � �� 8��¬��1�b 4M:P< c0: A ��1Ab < c Hu � 3 �� b < c � �� 8�®u�¯1Ab 7 c1: RGB 3;u�¯1Ab 7 c h �� 8�° � 35��%3;u � � � 8�Z��@±L� �� 8B � 35��3;u � H � 8�Z@�@±�� 8�`²³�� u�¯1Ab 4M:P< c1: A ��1Ab 7 c Hu � 3 �� b 7 c � �� 8�®u�¯1Ab 7 c2: RGB ��1�b 4P:P< c H«35�� 8¦^^tu�¯WAb 4M:P< c �¬��1�b 4M:P< c2: A 3 ��1Ab 7 c � �� 8ªHu � 3 �� b 7 c � �� 8��1�b 7 c3: RGB 3;u�¯1Ab 7 c h �� 8�° � 3�3;u � � � 8�Z�� ± �(�� 8B � 3�3;u � H � 8�Z��±L�(�� 8� ²³�� u�¯1Ab 4M:P< c4: RGB ��1�b 4P:P< c H«3 � � �(a�sL8¦^^tu�¯WAb 4M:P< c �¬��1�b 4M:P< c4: A ��1Ab 7 c H´3 � � 8v^�u�¯WAb < c

��1�b 7 cThe result of the two parallel 16 bit precisead-ditions or subtractionslies in ��W and can befurther processed by more register combinersorlighting operationsin the final combiner. Forclarity of presentationinputmappingsfor constantsare not explicitly given, but the ranges of the

color channels have beenchosensuch that therealways exists an appropriate mapping. Moreover,due to the number of occurringconstantswe usethe NV register combiners2extension providingcombiner dependentconstants.

Each16 bit addition above is emulatedby per-forming a componentwiseaddition on the colorchannels,thencheckingthesumof the lower com-ponent for an overflow andfinally correctingbothcomponents appropriately. The great advantageof the introducedcomposite number format is thatonly one such check is necessaryto handlebothpositiveandnegativeoverflow for bothadditionandsubtraction.In this way our formatfulfills thethirdconditionrequiredin Section2. Thisefficientcarry-over arithmeticis dueto the fact that the initial in-put mapping �µ�� for the addends is notthe correctmapping( �¶��·��

) for the firstcolor channel in our representation. The resultingdifferenceintroducesa ¸�� error, which cannot berepresentedin the first color channel. But the ap-propriate

�� correctionappliedto thesecondcolorchannel correctsapossibleoverflow thereinandthesumof thesecorrectionscanbe representedin thefirst channel. Thereforeonly onecondition hastobe checked to decidein which direction a correc-tion on thesecondchannelshouldtake place.

Thereasonfor theawkwardrepeatednegationincombiners1[RGB],2[RGB] andsimilarin 3[RGB],4[RGB] and 4[A] with intermediate3 � �� 8 -outputmappingis an effort to implicitly realizean addi-tion of

�� althoughthereis no suchinput or outputmapping. It is necessaryfor the re-encodingof theresultsfrom thesignedrangeb � �� 8 backto b �� c .Example: Rotation of normals. Let the texture¹ � define the x-component and the texture

¹ �the y-componentof a normalmap,whereasthe z-component is implicitly definedif we think of thenormalsto be of unit length. Thenwe canusethefollowing algorithm to rotate the normalsaroundthez-axis.¹ � D�3 � �iº 8 ¹ � H º ¹ � �¹ � Dg3 � �iº 8 ¹ � �iº ¹ � �4�D§»�3 ¹ � b : c � ¹ � b : c 8where º is a constant steering the speedof therotation and » is a texture which, addressedby the main components of

¹ � and¹ � , deliv-

ers the normal with the computed z-component¼ � � ¹ � b : c � � ¹ � b : c �.

666

3.3 Functions and Multiplication

We have suggested½ to arrangethe two 16 bit num-bersinto thecolorchannel pairs 3546��798 and 3;:6�(<68becausethesepairs can be usedfor a dependenttexture look-up. Such look-ups implement theapplicationof arbitraryfunctionson our composite16bit formatandwithin the4 pixel shadersboth16bit channelscanbemappedby a differentfunction.Let m be again a displacement map with an xandy component asbefore,and let » � and » � be��G�K ¡ ��G@K texturesencoding nonlinearfunctionsonthecomposite16 bit format.Thenwecanapply » �and » � simultaneously to m usingpixel shaders:¾ � 35|ib }Q¥ c 8 � ¾ � 35|ib �� c 80: tex2d 3?��o1�(��8p� � ��mq3?��oA81: ar2d 35mqb 7 c �(mqb 4 c 8�� » � 35mqb 4P7 c 82: gb2d 35mqb : c �smqb < c 8�� » � 35mqb :P< c 8

The textures » � and » � shouldcontainthe valuessuch that addressed by AR, where R is the firstcolor channel in the number representation,theydeliver the function value in AR and addressedby GB they deliver it in GB. If it is clear that afunction need not to be used in both dependentmodi, thena singletexture » storingthe valuesofboth » � and » � wouldsuffice. Onecouldevadethisdifficulty by storingthefirst channel of thenumberrepresentationin A insteadof R, but this wouldimply many moredifficultiesin otheroperations.

Example: Linear filters. In the former exampleswe have used multiplications of the form ��^��where � is a userdefinedconstant and � an in-termediatetexture result. To implementsuch anmultiplicationin 16 bit precisiononedefinesa tex-ture r�¿ containingthe productvaluesof arbitrary16 bit valueswith � and applies it to � obtain-ing 3 r0¿Ab 4P7 c 35mqb 497 c 8 ��r�¿]b :P< c 35mqb :P< c 8 . Sincethe applicationof the function usesonly the pixelshadersandthe additiononly the registercombin-ers,onecanperformanoperationlike m�Hi�6^&� inonepass.In particularonecanquickly implementa16 bit precisefilter usinga 3 by 3 stencil:

4�D�À

Á&Â , Á&Ã�Ä �� º Á%Â , Á Ã ^ r63?�FH�Å � ��oPH�Å � 8 �where º Á Â , Á Ã arethe filter coefficientsand 4 con-tains the filtered texture r . If eachof the coeffi-cients is different, which is seldomthe case,one

would need9 textures r Á Â , Á Ã encodingthe valuesof a multiplicationwith º Á Â , Á Ã andalso9 passestocomputethe result 4 . But asall computationsareperformedin 16 bit, theresultis significantlybetterthanin 8 bit, especiallyfor smallcoefficients.

In termsof hardware resources,the multiplica-tion is a muchmorecomplex operationthanthead-dition and thereforemore difficult to emulateus-ing lower precisioncomputingblocks.Thestartingpoint is thedecompositionof the16bit product intoa sumof 8 bit products.Let Æ and m be two tex-turesencodingthe16 bit numbersto be multipliedin thecolors 3;:6�(<j8 . Firstmultiplying therepresen-tationsof Æjb :P< c

and mqb :P< cwe obtain:

O 3;Æjb : c � Æjb < c 8�^ O 35mqb : c �smqb < c 8ÇDE ��Æjb : c � � J ^ E ��mqb : c � � JH ��#��$6EÈE ��Æjb : c � � J ^ E mqb < c � �� J

H E �@mqb : c � � J ^ E Æjb < c � �� J`JH ��#��$ ³ E`E Æjb < c � �� J ^ E mqb < c � �� JtJ

Thefirst addendof theresultcanbeevaluatedat16bit to ÉÊ3;Æjb : c �smqb : c 8 by a texture look-up withthe first components Æjb : c

and mqb : caddressinga

multiplicationtable É . Thesecondaddend maybecomputedand roundedby the register combiners,while the third gives at most

��' ³ , which is lessthanonehalf of the smallestrepresentable number�� #��$ , and thus will be ignored. In this waywe can implement a one-pass texture-texturemultiplication in 16 bit precision, but unlike theaddition only one of the 16 bit channelscan bemultiplied at once.Ë b � c ^�|�b � c0: tex2d 3?��o1��L8�� Æj3?��.oA81: tex2d 3?��o1��L8�� mq3?��o]82: dot2d � � D�35�� sL8 2 35��sÆjb : c � Æjb < c 8DÌÆjb : c3: lut2d � � Dg35a� � �(�8 2 35a�(mqb : c � mqb < c 8D�mqb : c

� � �Ég3 � � � � � 8The resulting textures ��1a�.�� and ��1Í arenow further processedin the register combinersto computethe mixed products and sum up theaddends of themultiplicationformula.

666

¢%£ x�Î�b �� c H¢%£ x�¤�b � c ^ ¢%£ xp¨1b � c H ¢%£ xp¨1b � c ^ ¢&£ x�¤�b � c0: RGB 3;�t^%��1�b 4M:P< c � � 8 2 35a� � �s�8�®u�¯1Ab 4M:P< c �3;�t^%�� b 4P:P< c � � 8 2 35a� � �s�8�®u�¯ � b 4M:P< c1: A u�¯WAb < c ^ 3 �� b 4M:P< c � �� 8SHYu�¯ � b < c ^^�3 ��1Ab 4M:P< c � �� 8i�®u�¯1Ab 7 c2: RGB 35��(a� � 8R^�u�¯1Ab 7 c �¬��1�b 4M:P< c2: A u�¯WAb 7 c H �� Ïu�¯1�b 7 c3: RGB 3;u�¯1Ab 7 c h �� 8�° � 35�� Z�� $ � �� 8B � 35��(a� � �� 8� ²³�� u�¯1Ab 4M:P< c4: RGB ��1�b 4P:P< c H«35�� 8¦^^tu�¯WAb 4M:P< c �¬��1�b 4M:P< c5: RGB 3 ��1�b 4P:P< c � �� 8ªH3 ��1ÍAb 4M:P< c � �� 8��¬��1Ab 4M:P< c5: A ��1Ab < c Hg3 ��1Í�b < c � �� 8��®u�¯1Ab 7 c6: RGB 3;u�¯1Ab 7 c h �� 8�° � 35�� 8B � 35�� Z�� $ � � �� 8� ²³�� u�¯1Ab 4M:P< c7: RGB ��1�b 4P:P< c H«35�� 8¦^^tu�¯WAb 4M:P< c �¬��1�b 4M:P< c

Themaincalculationstake placein combiner 1[A]wherethe sumof themixed products is computed,and in combiner 5[RGB] where the former sumis addedto the result of the multiplication table.Combiners3[RGB], 4[RGB] and6[RGB], 7[RGB]perform again the carry-over arithmeticas in thecaseof additionandsubtraction.They have differ-ent correctiondirections,becauseof the implicit 0in tex0[G] dueto combiner2[RGB].

Example: Nonlinear filters. In the last examplewe have seenhow a texture canbe filtered with astencil of constantcoefficients. If we useseveraltextures insteadof the constants,the coefficientsmay vary depending on the coordinatesand weobtaina nonlinearfilter:

4�D�À

Á&Â , Á&Ã�Ä ��#ÐÁ&Â , Á&Ã 3?��oA8v^ rF3?�FH{Å � �(oPH�Å � 8 �

where ÐÁ Â , Á Ã are now textures containing the

varying weights of the filter for each direction3?Å � ��Å � 8 . Nonlinearfilters canbeeffectively used

for edgesensitive denoisingof images. Figure 1shows the advantages of the increasedprecisioninthis application.

3.4 Performance

Apart from themultiplicationthenew operationsonthecomposite16 bit formatwill performat almost50% of the normal speed. Using the dot-productoperationinsteadof theoffset-texturefor dependenttexture look-upscostsa factorof 2. The5 combin-ersof the additionwould normalycosta factorof3, but sincesomesort of the muchslower depen-dent texture accesswill usually preceedthe addi-tion (a multiplication with a constantfor example)multiple registercombinersseldomreduceoverallperformance. Finally, the multiplication is compa-rably slow becausethedot-product operationtakes8 times longer thana normal texture access.Thisfactor, however, is not surprisingas the complex-ity of a multiplicationgrows quadratically with thebitlengthof theoperands,soit is ratheramazingthatit canbe realizedin a singlepassat all. Moreover,other time consuming proceduressuch as textureobject switchingor implicit pipelineflushingmayabsorbthesetheoreticalextra costs,ashasbeenex-periencedin thefilter example(Figure1).

4 Conclusions

A composite 16 bit numberformat hasbeenpre-sentedon which precisearithmeticand dependenttexture operationscan be efficiently performed.In particular this format allows dynamic accuratechangesto displacementmaps,normalsandfilters.Thesehigh precisionoperations naturally requiremoretexture memoryandcomputingtime, but arestill fast enough to be usedin precisionsensitivepartsof real-timemulti-passalgorithms. Also thedetailsof this 16 bit emulationmayseemdeterrentat first, however, onceimplementedthe operationscanbeusedin a simplemodular way. We hopethatby useof this virtual 16 bit format moreprecisionsensitive visualizationandcomputingcanbeaccel-eratedin graphicshardware.

References

[1] BrianCabral,Nancy Cam,andJimForan.Ac-celeratedvolume renderingand tomographic

666

reconstruction using texture mapping hard-ware. In Arie Kaufman and WolfgangKrueger, editors, 1994 Symposium on Vol-ume Visualization, pages91–98. ACM SIG-GRAPH,1994. ISBN 0-89791-741-3.

[2] Jonathan Cohen,ChrisTchou,Tim Hawkins,and Paul Debevec. Real-timehigh dynamicrange texture mapping. In Proceedings ofthe Eurographics Rendering Workshop 2001,2001.

[3] NVIDIA Corporation. NVIDIA OpenGLextension specifications. Technical report,NVIDIA Corporation,2001.

[4] Paul J. Diefenbachand Norman I. Badler.Multi-pass pipeline rendering: Realism fordynamic environments.In MichaelCohenandDavid Zeltzer, editors, 1997 Symposium onInteractive 3D Graphics, pages59–70. ACMSIGGRAPH,1997. ISBN 0-89791-884-3.

[5] U. Diewald, T. Preusser, M. Rumpf, andR. Strzodka. Diffusion modelsand their ac-celeratedsolutionin computervision applica-tions. Acta Mathematica Universitatis Come-nianae (AMUC), 70(1):15–31,2001.

[6] Paul HaeberliandMark Segal. Texturemap-ping as A fundamental drawing primitive.In Michael F. Cohen, Claude Puech, andFrancoisSillion, editors,Fourth EurographicsWorkshop on Rendering, pages259–266.Eu-rographics, June1993. held in Paris,France,14–16 June1993.

[7] W. Heidrich,R.Westermann,H.-P.Seidel,andT. Ertl. Applicationsof pixel texturesin vi-sualizationand realistic imagesynthesis. InACM Symposium on Interactive 3D Graphics.ACM/Siggraph,1999.

[8] Wolfgang Heidrich and Hans-PeterSeidel.Realistic, hardware-acceleratedshadingandlighting. In Alyn Rockwood,editor, Siggraph1999, Annual Conference Proceedings, An-nualConferenceSeries,pages171–178.ACMSiggraph,AddisonWesley Longman, 1999.

[9] Kenneth E. Hoff III, JohnKeyser, Ming Lin,DineshManocha,andTim Culver. Fastcom-putationof generalizedVoronoi diagramsus-ing graphics hardware. Computer Graph-ics, 33(Annual Conference Series):277–286,1999.

[10] M. Hopf andT. Ertl. Accelerating3d convo-lution usinggraphicshardware.In Proc. Visu-

alization ’99, pages471–474. IEEE,1999.[11] M. Hopf and T. Ertl. Hardware Accelerated

Wavelet Transformations.In Proceedings ofEG/IEEE TCVG Symposium on VisualizationVisSym ’00, pages93–103, 2000.

[12] Jan Kautz and Michael D. McCool. Inter-active renderingwith arbitraryBRDFs usingseparableapproximations. In ACM, editor,SIGGRAPH 99. Proceedings of the 1999 SIG-GRAPH annual conference: Conference ab-stracts and applications, ComputerGraphics,pages253–253.ACM Press,1999.

[13] Michael D. McCool andWolfgangHeidrich.Texture shaders. In ACM, editor, SIG-GRAPH ’99. Proceedings 1999 Eurograph-ics/SIGGRAPH workshop on Graphics hard-ware, Computer Graphics, pages117–126.ACM Press,1999.

[14] Mark S.Peercy, Marc Olano,JohnAirey, andP. Jeffrey Ungar. Interactive multi-passpro-grammableshading. In Kurt Akeley, editor,Siggraph 2000, Computer Graphics Proceed-ings,, Annual ConferenceSeries,pages425–432.ACM Press/ ACM SIGGRAPH/ Addi-sonWesley Longman,2000.

[15] KekoaProudfoot,William R.Mark, SvetoslavTzvetkov, and Pat Hanrahan. A real-timeprocedural shadingsystemfor programmablegraphics. In EugeneFiume, editor, SIG-GRAPH 2001, Computer Graphics Proceed-ings, Annual ConferenceSeries,pages159–170.ACM Press/ ACM SIGGRAPH,2001.

[16] M. RumpfandR.Strzodka.Level setsegmen-tation in graphicshardware. In ProceedingsICIP’01, volume3, pages1103–1106,2001.

[17] M. Rumpf and R. Strzodka. Using graphicscardsfor quantized FEM computations. InProceedings VIIP’01, pages193–202, 2001.

[18] A. Scheel, M. Stamminger, and H.-P. Sei-del. Tone reproductionfor interactive walk-throughs. Computer Graphics Forum, 19(3),August2000.

[19] JosStam. A simplefluid solver basedon theFFT. Journal of Graphics Tools, 6(2):43–52,2002.

[20] ChrisTrendallandA. JamesStewart. Generalcalculationsusinggraphicshardware,with ap-plications to interactive caustics. In Euro-graphics Workshop on Rendering, June2000.

666

8 bit results 8 bit encolored virtual 16 bit results virtual 16 bit encolored

Figure1: Fromtop to bottomevery tenthresultof anonlineardiffusionfilter appliedto anoisy ��G�K � imageis shown. Althoughthelast8 bit resultmayseempleasantatfirst, thedarkbluebackgroundandtheyellowand greencolor of the moon clearly convey a massdefect. The new 16 bit format, on the other hand,preservestheoverall massandeliminatesartefactsmuchsmootherdueto thefinerquantization.For adirectcomparisonbothsequenceswerecomputedonRGB8textures(theread-backfor LA8 wasveryslow). The8 bit computation took 5msfor a time-stepandthevirtual 16 bit computation8ms. Althoughthreeindependent8 bit filters could have beenusedin parallelon RGB8, the performanceof more than50%of thenormalspeedtogetherwith thehigherquality resultscountin favourof thevirtual 16bit format.

666

Date post:	12-Feb-2022
Category:	Documents
Upload:	others
View:	2 times
Download:	0 times

Virtual 16 Bit Precise Operations on RGBA8 Textures

Documents