+ All Categories
Home > Documents > Instant Feedback Rapid Prototyping for GPU-Accelerated...

Instant Feedback Rapid Prototyping for GPU-Accelerated...

Date post: 26-Oct-2019
Category:
Upload: others
View: 7 times
Download: 0 times
Share this document with a friend
10
Research Article Instant Feedback Rapid Prototyping for GPU-Accelerated Computation, Manipulation, and Visualization of Multidimensional Data Maximilian Malek and Christoph W. Sensen Institute of Computational Biotechnology, Graz University of Technology, Graz, Austria Correspondence should be addressed to Maximilian Malek; [email protected] Received 16 February 2018; Accepted 10 April 2018; Published 3 June 2018 Academic Editor: Lizhi Sun Copyright © 2018 Maximilian Malek and Christoph W. Sensen. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Objective. We have created an open-source application and framework for rapid GPU-accelerated prototyping, targeting image analysis, including volumetric images such as CT or MRI data. Methods. A visual graph editor enables the design of processing pipelines without programming. Run-time compiled compute shaders enable prototyping of complex operations in a matter of minutes. Results. GPU-acceleration increases processing the speed by at least an order of magnitude when compared to traditional multithreaded CPU-based implementations, while offering the flexibility of scripted implementations. Conclusion. Our framework enables real-time, intuition-guided accelerated algorithm and method development, supported by built-in scriptable visualization. Significance. is is, to our knowledge, the first tool for medical data analysis that provides both high performance and rapid prototyping. As such, it has the potential to act as a force multiplier for further research, enabling handling of high-resolution datasets while providing quasi-instant feedback and visualization of results. 1. Introduction As datasets grow ever larger, so does the importance of efficient processing by fully utilizing the information they contain. However, most data processing is still done on generic CPUs, even though programmable GPUs, capable of performing arbitrary computations, have been available on the consumer market since 2006. Oſten, processing of many data types, for example, image data, is followed by some form of visualization. Most existing tools for medical images are either viewers or are focused on the processing of data. Of the viewers, there is either focus on medical insight [1], or pleasing visual rendering [2], or both, depending on the use case. Visualization of data and human intuition together can provide crucial insights into a given dataset. One example is the visualization of metadata created from patient data (e.g., 3D-renditions of data derived from x-ray images, MR scans, and CT scans), allowing patients to better understand the nature of their condition. Up to now, viewers are typically not user-programmable and only provide a limited set of parameters to adjust their output. Twenty years ago, data transformation was integrated into the rendering process, as a direct transformation alone was too costly for the hardware available at the time [3]. e cur- rent hardware has much higher capabilities and processing of datasets, which were previously considered as too large, has become common. With the advent of programmable GPUs in the mid-2000s, GPUs are now used to perform general- purpose computations (GPGPU) to process data in parallel, reducing the computational time drastically [4–7]. Although originally designed for graphics applications, the massively parallel design of GPUs allows data processing much more efficiently than is typically possible with traditional CPUs, while at the same time reducing the hardware footprint to the size of a graphics card. Given the increased computational capabilities, using the GPU for both processing and rendering is faster and more flexible compared to earlier approaches on CPUs or fixed-function GPUs [8, 9]. GPU-accelerated Hindawi International Journal of Biomedical Imaging Volume 2018, Article ID 2046269, 9 pages https://doi.org/10.1155/2018/2046269
Transcript
Page 1: Instant Feedback Rapid Prototyping for GPU-Accelerated ...downloads.hindawi.com/journals/ijbi/2018/2046269.pdf · ResearchArticle Instant Feedback Rapid Prototyping for GPU-Accelerated

Research ArticleInstant Feedback Rapid Prototyping forGPU-Accelerated Computation Manipulation andVisualization of Multidimensional Data

Maximilian Malek and ChristophW Sensen

Institute of Computational Biotechnology Graz University of Technology Graz Austria

Correspondence should be addressed to Maximilian Malek malektugrazat

Received 16 February 2018 Accepted 10 April 2018 Published 3 June 2018

Academic Editor Lizhi Sun

Copyright copy 2018 Maximilian Malek and Christoph W Sensen This is an open access article distributed under the CreativeCommons Attribution License which permits unrestricted use distribution and reproduction in any medium provided theoriginal work is properly cited

Objective We have created an open-source application and framework for rapid GPU-accelerated prototyping targeting imageanalysis including volumetric images such as CT or MRI data Methods A visual graph editor enables the design of processingpipelines without programming Run-time compiled compute shaders enable prototyping of complex operations in a matter ofminutes Results GPU-acceleration increases processing the speed by at least an order of magnitude when compared to traditionalmultithreaded CPU-based implementations while offering the flexibility of scripted implementations Conclusion Our frameworkenables real-time intuition-guided accelerated algorithm and method development supported by built-in scriptable visualizationSignificance This is to our knowledge the first tool for medical data analysis that provides both high performance and rapidprototyping As such it has the potential to act as a force multiplier for further research enabling handling of high-resolutiondatasets while providing quasi-instant feedback and visualization of results

1 Introduction

As datasets grow ever larger so does the importance ofefficient processing by fully utilizing the information theycontain However most data processing is still done ongeneric CPUs even though programmable GPUs capable ofperforming arbitrary computations have been available onthe consumer market since 2006

Often processing of many data types for example imagedata is followed by some form of visualization Most existingtools for medical images are either viewers or are focused onthe processing of data Of the viewers there is either focus onmedical insight [1] or pleasing visual rendering [2] or bothdepending on the use case Visualization of data and humanintuition together can provide crucial insights into a givendataset One example is the visualization of metadata createdfrom patient data (eg 3D-renditions of data derived fromx-ray images MR scans and CT scans) allowing patientsto better understand the nature of their condition Up to

now viewers are typically not user-programmable and onlyprovide a limited set of parameters to adjust their output

Twenty years ago data transformationwas integrated intothe rendering process as a direct transformation alone wastoo costly for the hardware available at the time [3] The cur-rent hardware has much higher capabilities and processing ofdatasets which were previously considered as too large hasbecome common With the advent of programmable GPUsin the mid-2000s GPUs are now used to perform general-purpose computations (GPGPU) to process data in parallelreducing the computational time drastically [4ndash7] Althoughoriginally designed for graphics applications the massivelyparallel design of GPUs allows data processing much moreefficiently than is typically possible with traditional CPUswhile at the same time reducing the hardware footprint tothe size of a graphics card Given the increased computationalcapabilities using theGPU for both processing and renderingis faster and more flexible compared to earlier approacheson CPUs or fixed-function GPUs [8 9] GPU-accelerated

HindawiInternational Journal of Biomedical ImagingVolume 2018 Article ID 2046269 9 pageshttpsdoiorg10115520182046269

2 International Journal of Biomedical Imaging

computation is however not as ubiquitous as it could be asthe development of parallel algorithms can be prohibitivelydifficult especially for those not familiar with the differentprogramming model [10] Initial use of GPGPU techniquesinvolved interpreting data as textures and then performingtypical graphics operations on them such as blendingprojection or interpolation [11 12]

As GPGPU techniques matured dedicated libraries forGPU programming became common of which the mostwell-known examples are CUDA [10] and OpenCL [13]CUDA requires a separate compilation step and is thus inade-quate for rapid prototyping In contrast OpenGL [14] whichis typically used as a backend for graphics and rendering hashad support for a run-time compiled shading language (GLSL[15 16]) for a long time This means that the GLSL sourcecode is passed to the graphics driver which then dynamicallycompiles an appropriate binary representation for the plat-form OpenGL 43 released in 2014 was extended to supportcompute shaders which facilitate arbitrary computation onthe graphics card directly

We exploit this dynamism to enable interactive develop-ment in GPU-accelerated computing and data explorationwith medical image processing in mind Specifically we wantto be able to see results immediately even while editing thesource code This concept is used today for example forentertainment in the so-calledDemoscene [17ndash19]We expectthis approach to also enable intuition-driven development inthe domain of medical image analysis

2 Methods

We have created a computational framework which facili-tates scriptable rapid prototyping friendly GPU-acceleratedcomputing and rendering of medical data On the highestlevel of the user interface is a graph editor to control theunderlying graph-based processing pipeline where nodesperformoperations on data and edges between them indicatedata flow (Figure 7) Internally the framework consists ofa scene graph describing hierarchies of objects This is inprinciple the same architecture skeleton as used by real-timeor game engines [20] The architecture supports creatingmultiple scenes performing rendering and subsequentlyeither the display of the result or further processing Asan example a volume (3D) texture can be rendered fromdifferent perspectives via the integrated volume renderer intoa number of 2D textures Since data processing happens viaa flexible user-controlled data pipeline the resulting texturescan be further processed

Key features are very rapid prototyping and short iter-ation times which allow to obtain results quickly Sincelarge parts of the software are scripted and all of the scriptscan be changed and reloaded at any time many featuresincluding the user interface (UI) can be changed withoutrestarting the framework GPU computation is realized withcompute shaders which can be changed and reloaded inthe same way as the scripts Inputoutput file formats areautomatically detected and support for new formats canbe added via plugins Multiple windows and screens aresupported to maximize the usable space This feature also

Core scripts

Lua APIImGui

Node scriptsApplication

PluginCore library

PluginSDL

OpenGLGPU Driver SystemGPU abstraction System abstractionLua

Plugin API

Figure 1 Block diagram of the software SDL provides most ofthe platform-dependent functionality everything except the lowestsoftware layer is completely platform-independent

provides support for more advanced display configurationssuch as two-projector stereoscopic 3D setups or CAVEs [21]

Scripting is realized with Lua [22] driving applicationand pipeline logic UI and node functionality The rest of theapplication and library is implemented in C++ Aside fromLua the external libraries utilized include SDL [23] (for cross-platform support) OpenGL and Dear ImGui [24] (for theUI) A custom plugin interface is implemented to enable sup-port for extensions third-party file formats and additionalLua functions The GPU backend used is currently OpenGLversion 45 [25] OpenGL is a cross-platform graphics APIwhich combines rendering and GPU-accelerated computa-tion The implementation of the backend follows modernAZDO (approaching zero driver overhead [26]) principleswhere applicable Accelerated processing is performed viacompute shaders which are written in GLSL Due to thepipelined nature of OpenGL most of its operations are per-formed in the background while the main CPU can performother tasks [25] The core library also supports semi-auto-matic multithreading of CPU-bound tasks but since theheavy compute jobs are usually performed by the GPU mul-tithreading is rarely necessary Figure 1 shows an overview ofthe overall framework design

From a developerrsquos perspective the following functionsare provided Vertex and fragment shaders facilitate therendering tasks and compute shaders perform the computa-tional tasks Shader introspection is used to determineinputs outputs andparameters of the particular shadersThisinformation is used by theUI Textures are used for image anddata storage and can be one- to three-dimensional with 1ndash4channels using various internal formats (eg 8 bits to savespace 16 bits for high detail and float for HDR data) Texturefetches can optionally be customizedwith a swizzlemask thatis the order in which color components are sampled is user-controllable GPU Buffer objects provide support for arbi-trary unformatted memory using persistent and coherentmemory mapping Manipulating them from either the CPUor GPU side respectively is possible without special con-straints The following OpenGL extensions are automaticallyused when supported by the system ARB bindless textureARB gpu shader int64 ARB gpu shader5 memory info andvarious robustness extensions to recover from driver crashesARB robustness ARB robust buffer access behavior andARB create context robustness as provided by SDL Despitetheir benefits and simplicity OpenGL compute shaders arenot intended for very large datasets as their running time

International Journal of Biomedical Imaging 3

is usually limited by the graphics driver two seconds is thedefault for recent NVidia drivers on Windows If a computeshader has not finished within that time limit the shader isforcefully terminated On Windows the graphics driver isreset which usually causes program termination In orderto overcome this limitation large datasets are automaticallysplit into smaller tiles which are then processed individuallyacross multiple shader invocations

A simple custom file format to store up to three-dimensional image data is included It supports lossless com-pression via the ZStandard algorithm [27] and is optimizedfor fast loading and simplicity in order to keep the core cleanTwo optional standard plugins are provided The first oneuses ITK [28] to add support for the Bitmap JPEG GDCMDICOM GIPL MetaImage Nrrd TIFF PNG StimulateVTK Nifti Gipl andHDF5 [29] file formats A full list can befound in the ITKwiki [30]The second one uses the stb imagelibrary [31] and adds support for PNG Bitmap TGA HDRJPEG PSD (Photoshop) PNMPPM and GIF files The twoplugins are independent of each other and we considerespecially the latter as a good starting point for users tryingto implement their own plugins The API does not exposeimplementation details or the GPU backend and is version-compatible in both directions Details about the plugin APIcan be found in the supplement In order to facilitate scriptingand script debugging a built-in real-time data inspector isincluded which can be used to traverse any Lua object alongwith attached variables functions and classes making itpossible to preview values and data objects where supported

From a userrsquos perspective a number of nodes are alreadyincluded implementing the following filtersalgorithms cur-vature derivative (edge detection) distance transform vari-ous simple math operations (element-wise addition subtrac-tion multiplication division power function ie all func-tions supported by GLSL) minmaxaverage region filtermedian filter surface normal extraction 3D rarr 2D sliceextraction convolution (Gaussian blur) type conversionand thresholding Two nodes accept a custom GLSL codesnippet from the user enabling live programming The firstcompiles the entered code to a compute shader to processor generate arbitrary data The second node compiles toa fragment shader that generates an arbitrary 2D texturewith limited compatibility to Shadertoy [17] Other nodesthat do not perform computation can act as data sources orsinks although there is no clear separation between the tworoles Image or memory buffer loaders are pure sources Avolume renderer essentially transforms a 3D dataset into a2D image given a perspective A universal memory viewerwith included hex-editor (that works on CPU and GPUmemory) is useful to diagnose low-levelmemory layout prob-lems

The layered architecture allows even novice users todesign a processing pipeline visually and interact with theprovided widgets without any programming being requiredfrom the user For quick familiarization with the UI context-sensitive help descriptions and tooltips as well as a generalguide are displayedwhen appropriateThenovice user is onlylimited by their knowledge of what existing algorithms dohow to use them and how to combine them to perform a

higher-level task More advanced users can quickly developnew nodes using GLSL for the computation and Lua for theinterface thus computation is always GPU-accelerated andscripting facilitates the rapid development This also enablesusers to quickly write prototype code for specific use casesA node is implemented as a single Lua script optionallycontaining GLSL code Figure 5 shows a complete exampleThe supplement contains more information and examplesdetailing the node API and implementation of custom nodes

3 Results

31 User Interface Our UI is designed for fast iteration real-time interaction and parameter adjustment to provide asmuch visual feedback as possible and to clearly highlightuser errors when (and also how) they occur Data manipu-lation effectively happens by linking nodes together to forma directed acyclic graph (DAG) Color-coded connectorsprevent accidental type-incompatible connections It is alsonot possible to construct cycles Any attempt to perform theseerroneous graph constructions will lead to an immediatedisplay of an appropriate error message (Figure 6)

For developers writing their own nodes there are automa-tisms in place that attempt to predict inputoutput propertiesand how to invoke a compute shader for commonly usedscenarios simplifying the development even more Figure 5shows an example and the software documentation providesan even more detailed explanation If more customization isrequired almost all functionality of a node can be speciallyimplemented including the UI

32 Comparison to Existing Tools We have compared oursoftware package to MeVisLab [32] DeVIDE [33] GRAPE[34] GraphMIC [35] and FAST [4] While FAST is a stand-alone library for OpenCL-accelerated image manipulationand visualization the other packages are mainly focused ongraph-based image processing and all of these utilize ITKVTK [36] MITK [37 38] or a variation of these libraries toperform the computation and rendering tasks Consequentlythey share ITKrsquos main weakness that is the CPU-boundprocessing without (or with very limited) GPU acceleration[39] All tools except MeVisLab are provided as open sourcepackages

In comparison to our package DeVIDE requires detailedknowledge of ITK as it directly maps ITK functions to graphnodes A version of the package that does not require ITKexists but has limited functionality The DeVIDE UI is notusable intuitively as nodes use internal names and theirfunction is not always clear Connectors are neither labellednor color-coded and any input can be connected to anyoutputMismatched connectors or cycles cause an errorwhentrying to execute the graph Missing parameters for a nodeare not signaled until an attempt ismade to execute the graphuponwhich an error is shownWewere unable to do anythingmeaningful with DeVIDE since we were unable to identify avalid combination of nodes which when connected togetherwould produce output and not cause graph execution tofail The UI has no apparent preview functionality availableDeVIDE is scriptable in Python

4 International Journal of Biomedical Imaging

MeVisLab is a large well-established commercial soft-ware package intended for rapid prototyping and imagemanipulation applications It is rapid prototyping friendly inthe sense that results can be quickly obtained and previews atevery stage provide visual feedback However implementingany custom extension requires the MeVisLab SDK and aC++ compiler therefore the actual development of customextensions is not rapid prototyping friendly Constructing agraph containing a cycle resulted in a crash MeVisLab isscriptable in Python

GraphMIC is only available for macOS thus we decidednot to test it as we wanted to focus on platform-independentpackages From the documentation it seems to be very simi-lar to DeVIDE concept-wise but it should bemore rapid pro-totyping and user-friendly since it supports previews param-eters can be adjusted directly on the nodes and inputoutputconnectors are clearly labelledThis package is also scriptablein Python

GRAPE is similar to GraphMIC and DeVIDE and men-tioned for completeness

FAST is not a graph editor but a C++ library similarto ITK and VTK that aims to cover similar use cases Ituses OpenCL to accelerate image operations and OpenGL torender results There is no scripting or UI and it is not veryrapid prototyping friendly in the current form since it targetsusage by C++ programmers only

In conclusion all of the listed graph-based tools are basedon ITK and VTK Other common dependencies are Qt [40]Python [41] and boost [42] These libraries are very large andcan be a hassle to build or to get working properly In contrastour package only depends on a single external library SDLwhich is small and easy to build on many operating systemsWe have tested our package on Windows and Linux and assoon as a working OpenGL 45 driver for macOS is availableit will be supported as well Therefore we expect that incomparison to similar packages our solution will be theeasiest to deploy as long as the systemrsquos graphics driver is ableto support at least OpenGL 45

33 Usage Examples The test system used for our bench-marking efforts was a consumer notebook with an IntelCore i7-4790S CPU 32GHz (4 cores 8 threads) anda NVIDIA GeForce GTX 965M graphics card with 4GBmemory operating under Windows 81 The software wascompiled using Visual Studio 2015 Update 3 All pipelinesshown below are included in the release package as examplesWe do not include MRI source data due to data protectionGood sources for initial test data are the Digimouse [43] andthe 119881and3 [44] datasets Figure 2 is an example for some of therendering modes possible with the built-in volume renderernode

34 Distance Transform We have performed a benchmarktest comparing the performance of a 3D distance transformimplemented in multithreaded C++ and GLSL respectivelySpecifically we chose the fast distance transform methodfrom [45] because the algorithm is not a pointwise operation(and therefore not trivially GPU-parallelizable) and needs aninitial preparation pass plus one pass for each axis (4 in total)

Figure 2 Selection of different renderings of CT data produced bythe included volume renderer node and some supporting nodesFrom left to right solid with curvature as color solid rescaled withcurvature as color edgestep function solid rescaled translucent

Thedistance transform can be used for further processing Anexample is given in Figure 3

Given the result in Table 1 we conclude that our GPUvariant is about two orders ofmagnitude faster than theCPU-based 3D distance transform for this specific computationenabling almost interactive operation (eg changing thethreshold and observing results) Figure 3 is an example foran extended use case a custom compute node executes asnippet of GLSL code which utilizes the distance transformalgorithm to remove most of a human skull The entiresnippet runs in less than 4ms for a 256 times 256 times 176 volumeWhile the rendered result is not perfect and a bit noisythe approach was developed within few minutes on thefly including the parameters used for the transformationExcluding the distance transform (which has to be computedonly once) but including volume rendering of the result instereo 3D the pipeline takes less than 20ms to execute

35 Segmentation A typical operation for medical imageanalysis is tissue segmentation We have developed a simpleproof-of-concept segmentation pipeline that extracts a spe-cific intensity range followed by a smoothing and thresholdoperation to form a mask This mask is then used tosegment skin and skull from the remaining tissue Anotherquickly developed code snippet performs the remainingsegmentation and contrast enhancement For visualizationa slice is extracted (Figure 4) Note that the segmentation wasperformed on the complete 256times256times176 volume instead ofa single slice onlyThewhole pipeline executes in about 161msfor this volume of which the final segmentation snippet took4ms An alternative pipeline that processes the same volumein 22ms is also included in the examples

36 Other Benchmarks We have included two more exam-ples for point-wise operations into the benchmarks in Table 1median filtering and intensity rescaling Intensity rescaling isa two-step operation First minimal and maximal intensityvalue are determined and then each pixel value is scaledaccordingly so that the output values are in [0 sdot sdot sdot 1] Simplepoint-wise operations like this can benefit even more fromGPU acceleration such as Gaussian smoothing arbitraryconvolutions edge detection thresholding and filteringThis kind of relatively simple operation can finish in a fewmilliseconds (ms) for typical input sizes (eg a volume ofsize 5123 voxels) allowing fully interactive use and parameter

International Journal of Biomedical Imaging 5

Figure 3The distance transform from Table 1 combined with a custom compute node running the GLSL snippet on the right is a simple wayto remove most of the skull in a head MRI scan The result is redcyan stereo-rendered

(a) (b)

Figure 4 Slice of a brain MRI scan (a) is normalized but otherwise unprocessed (b) is segmented into skullskin (teal) cerebrospinal fluid(red) and brain (grey) tissue The brain tissue is contrast-enhanced

Table 1 Timings for selected operations on 3 different volumes The speedup factor between CPU and GPU is calculated as the lowest CPUtime (8 threads) divided by the GPU time The 3D Euclidian distance transform operates on normalized (pixel values in [0 sdot sdot sdot 1]) volumeswith a solidity threshold of 05 The CPU implementation is our own including the multithreading Intensity rescaling and median filteringuse ITKrsquos CPU-based implementation The median filter takes the direct neighborhood of each voxel into account that is a box of 27 voxelsin total

Volume CPU (1 thread) CPU (8 threads) GPU Speedup 1 versus 8 threads Speedup CPU versus GPUDistance transform

256 times 256 times 176 1345 s 459 s 630ms 29x 73x512 times 512 times 19 632 s 203 s 217ms 31x 93x380 times 992 times 208 1631 s 627 s 4900ms 26x 128x

Rescale intensity to [0 sdot sdot sdot 1]256 times 256 times 176 88ms 68ms 101ms 129x 67x512 times 512 times 19 38ms 30ms 72ms 127x 417x380 times 992 times 208 603ms 457ms 975ms 131x 47x

Median filter256 times 256 times 176 4155ms 1544ms 97ms 27x 159x512 times 512 times 19 2120ms 671ms 48ms 315x 14x380 times 992 times 208 14 s 4800ms 715ms 29x 67x

6 International Journal of Biomedical Imaging

Figure 5 A minimal working example This node calculates (V +add)exp for every pixel and color channel in a 2D image resultingin contrast enhancement The only Lua code is the definitiontable in the last line the rest is GLSL embedded in a Lua stringMissing interface functions are automatically inducedThe resultinggraphical representation is displayed in the small boxnode (darkbackground green title bar) More information can be found in thesupplement (available here)

adjustment in real-time (Figures 4 and 5) Median filteringon the GPU is costly due to the sorting involved sorting isimplemented using a parallel sorting network [46]

4 Discussion

We expect that our framework will make GPU-accelerateddata processing more accessible While the examples givenso far mainly target use cases in the medical domain ourmethod is not limited to these and can be used formany kindsof 3D data processing that involve images or parallelizableoperations on a block of data The motivation to create aself-contained framework for rapid prototyping and built-in visualization came from difficulties when developingcertain methods to operate on CTMRI data The intendedalgorithms were too costly to execute on CPUs in anyreasonable time so GPU acceleration was a necessity Manytime-consuming problems during development could havebeen avoided if not only the inspection of data had beensupported visually but also the in-memory representationhad been easily accessible Our prototype of the pipelinewas hardcoded in C++ using plain OpenGL [47] with-out rapid prototyping functionality or automatic memorymanagement thus changing parameters or rewiring pipelineconnections required recompiling and rerunning the wholeprogram often causing problems due to technical oversightsand remaining bugs Our current framework attempts tosolve these problems It accelerates the once time-consumingpart of pipeline development so that the implementationcycles for new features are much shorter and also much moreefficient Regarding the implementation we intentionally rely

solely onOpenGL forGPUacceleration because it is themostcompatible complete and platform-independent graphicsand compute API currently available Other options eitherare vendor-locked (CUDA) have no graphics capabilities(OpenCL) or are exclusive to an operating system family(DirectX Metal) OpenGL does not come without problemshowever GPU drivers are mostly proprietary each imple-menting a different interpretation of the OpenGL specifi-cation sometimes exposing implementation differences andbugs [4 47] This may affect the core implementation there-fore care has been taken to adhere to the OpenGL 45 specifi-cation but some drivers may still cause problems Practicallythismay also cause user-writtenGLSL shaders towork fine ona specific setup but fail to compile or misbehave on anotherif the respective driverrsquos GLSL compilers exhibit differences

We also chose deliberately to not only enable but alsoenforce all node computations to be performed by the GPUWhile this rules out incorporating popular libraries suchas ITK and OpenCV we believe that this is the only wayto ensure a consistently fast pipeline without unnecessarilyslow legacy components The current focususe case of theapplication is novel development instead of reusing existingcomponents in a new package This is not expected to be aproblem for most users as this is what sets our frameworkapart from others

Given the ability to type in code at run-time andinspecting results immediately intuition-guided explorationof data is much easier and more interactive in comparisonto existing solutions as they offer mostly premade buildingblocksThe combination of rapid prototyping throughout ourentire implementation while at the same time also GPU-accelerating all calculations allows a user to process largerdatasets faster than with any of the other tools as these wereapparently not built with this kind of dynamism in mind

A selection of algorithms is included in the first publicrelease They might not be sufficient to cover all use casestherefore the use for medical research is not yet the primaryfocus of our package However even at this early stage ofdevelopment our software is useful for teaching and exten-sion for specific tasks by anyone

41 Future Work In its current state the software is a stand-alone application rather than a library Future functionalitymay include a program exporter to design entire processingpipelines inside the user interface and then export a singlescript that implements this dataflow graph This would beuseful for batch processing and inclusion in existing pipelinesand is expected to greatly simplify deployment for end usersWe also plan to include support for point clouds and regular3Dmeshes as they are strongly tied to graphics developmentand benefit greatly from GPU-accelerated processing Webelieve combining the ability to manipulate and visualizethese types of data in one package will be beneficial to otherrelated fieldsMoving away fromOpenGL is not a plan for theimmediate future but a long-term goal Switching to Vulkan[48] as a backend would not only enable multithreadedmulti-GPU computation but also enable support for mobiledevices (eg Android-based tablets) and hopefully minimizedriver-specific behavior when compared to OpenGL

International Journal of Biomedical Imaging 7

(a) (b)

Figure 6 Comparison of our graph editor (b) with the interface of DeVIDE (a) Our graph editor shows a lot more detail including a previewwhen hovering in-outputs if possible It also checks formisuse that is ensures connector type compatibility and prevents cyclesThe depictedDeVIDE graph did not work despite trying multiple variants There was no sign of error when building the graph

Figure 7 A dataflow graph to visualize a mouse CT volume Both ldquo3D Volumerdquo nodes render a volume texture into a 2D image givenindividual settings In this example the resulting 2D image is set to 4K resolution (3840 times 2160) An estimate of the time in milliseconds thata node spent computing on the GPU is displayed on each node respectively Any parameterimage changes propagate downstream so thatthe whole graph updates itself as necessary

Data Availability

Program and source code are accessible at httpsbitbucketorgmaxmalekxcv The mouse volume used in some of theexamples was converted from theDigimouse dataset [43]Thehuman head MRI data set was donated under the conditionof anonymity and cannot be published

Conflicts of Interest

There are no conflicts of interest financial or otherwiseassociated with this paper

Acknowledgments

Thisworkwas supported byTUGrazOpenAccess PublishingFund

Supplementary Materials

The supplement is a quick-start and developerrsquos guide for oursoftware xcv It describes how to extend the program relatedAPIs and good GLSL practices to maximize compatibilityacross graphics drivers The supplement also provides a

hands-on example for the implementation of a new nodetype (Supplementary Materials)

References

[1] D Fortmeier Direct volume rendering methods for needle inser-tion simulation [PhD Dissertation] University of LubeckGermany 2016 httpwwwzhbuni-luebeckdeepubsediss1748pdf

[2] J Zhou X Wang H Cui et al ldquoTopology-aware illuminationdesign for volume renderingrdquo BMC Bioinformatics vol 17 no1 2016

[3] R Srinivasan and S Fang ldquoIntegrating volume morphing andvisualizationrdquoComputational Geometry vol 15 no 1-3 pp 149ndash159 2000

[4] E Smistad M Bozorgi and F Lindseth ldquoFAST frameworkfor heterogeneousmedical image computing and visualizationrdquoInternational Journal for Computer Assisted Radiology andSurgery vol 10 no 11 pp 1811ndash1822 2015

[5] M A Akhloufi F Gariepy and G Champagne ldquoGPGPU real-time texture analysis frameworkrdquo in Proceedings of the Con-ference on Parallel Processing for Imaging Applications 2011 JD Owens I Lin Y Zhang and G B Beretta Eds vol 7872pp 7872-7872 San Francisco Airport CA USA 2011

8 International Journal of Biomedical Imaging

[6] R P Broussard and R W Ives ldquoUsing a commercial graphicalprocessing unit and theCUDAprogramming language to accel-erate scientific image processing applicationsrdquo in Proceedings ofthe Conference on Parallel Processing for Imaging Applications2011 J D Owens I Lin Y Zhang and G B Beretta Eds SPIEProceedings San Francisco Airport California USA

[7] A Eklund M Andersson and H Knutsson ldquofMRI analysis onthe GPUmdashPossibilities and challengesrdquo Computer Methods andPrograms in Biomedicine vol 105 no 2 pp 145ndash161 2012

[8] F Dachille K Kreeger B Chen I Bitter and A E KaufmanldquoHigh-quality volume rendering using texture mapping hard-warerdquo in Proceedings of the 1998 ACM SIGGRAPHEURO-GRAPHICS Workshop on Graphics Hardware A E KaufmanW Straszliger G Knittel H Pfister and S N Spencer Eds vol 31Lisbon Portugal 1998

[9] M D Hanwell K M Martin A Chaudhary and L S AvilaldquoThe Visualization Toolkit (VTK) Rewriting the renderingcode for modern graphics cardsrdquo SoftwareX vol 1-2 pp 9ndash122015

[10] J Sanders and E Kandrot CUDA by example an introductionto general purpose GPU programming Addison-Wesley 2011httpwwwworldcatorgoclc699702402

[11] R Yang andGWelch ldquoFast image segmentation and smoothingusing commodity graphics hardwarerdquo Journal of Graphics (GPUamp Game) Tools vol 7 no 4 pp 91ndash100 2002

[12] F D Igual R Mayo T D R Hartley U V Catalyurek A RuizandM Ujaldon ldquoExploring the GPU for enhancing parallelismon color and texture analysisrdquo in Proceedings of the in ParallelComputing FromMulticores and GPUs to Petascale Proceedingsof the conference ParCo B M Chapman F Desprez G RJoubert A Lichnewsky F J Peters and T Priol Eds vol 19pp 299ndash306 IOS Press 2009

[13] KOWGroupThe opencl specification version 22 2017 httpswwwkhronosorgopencl

[14] P Cozzi and C Riccio OpenGL Insights CRC Press 2012httpwwwopenglinsightscom

[15] M Bailey and S Cunningham ldquoComputer graphics shadersusing OpenGL 4Xrdquo in Proceedings of the ACM SIGGRAPHASIA 2010 Courses pp 1ndash173 Seoul Republic of Korea Decem-ber 2010

[16] M Bailey and S Cunningham Eds Graphics Shaders Theoryand Practice Taylor amp Francis 2nd edition 2011

[17] P Jeremias and I Quilez ldquoShadertoy Learn to create everythingin a fragment shaderrdquo in Proceedings of the SIGGRAPH Asia2014 Courses pp 1ndash15 New York NY USA December 2014

[18] M Reunanen Times of change in the demoscene A creative com-munity and its relationship with technology [PhD dissertation]University of Turku 2017

[19] D Hartmann Digital Art Natives Praktiken Artefakte undStrukturen der Computer-Demoszene Kulturverlag KadmosBerlin httpdigitalartnativesde

[20] H Marin-Vega G Alor-Hernandez R Zatarain-Cabada M LBarron-Estrada and J L Garcıa-Alcaraz ldquoA Brief Review ofGame Engines for Educational and Serious Games Develop-mentrdquo Journal of Information Technology Research vol 10 no4 pp 1ndash22 2017

[21] A L Turinsky E Fanea Q Trinh et al ldquoCAVEman Standard-ized anatomical context for biomedical data mappingrdquo Anato-mical Sciences Education vol 1 no 1 pp 10ndash18 2008

[22] R Ierusalimschy L H de Figueiredo and W C Filho ldquoLua-anextensible extension languagerdquo Software Practice and Experi-ence vol 26 no 6 pp 635ndash652 1996

[23] S Lantinga Simple DirectMedia Layer 2017 httplibsdlorg[24] O Cornut httpsgithubcomocornutimgui[25] K OW GroupThe opengl graphics system A specification (ver-

sion 45 (core profile) httpswwwkhronosorgregistryOpenGLspecsglglspec45corepdf

[26] C Everitt G Sellers J McDonald and T Foley httpswwwslidesharenetCassEverittapproaching-zero-driver-overhead

[27] Y Collet Zstandard 2015 httpzstdnet[28] H J Johnson M McCormick L Ibanez and T I S Con-

sortium The ITK Software Guide Kitware Inc 3rd editionhttpsitkorgItkSoftwareGuidepdf

[29] The HDF Group (1997-2017) Hierarchical data format version5 httpwwwhdfgrouporgHDF5

[30] httpsitkorgWikiITKFile Formats[31] S Barrett httpsgithubcomnothingsstb[32] F Ritter T Boskamp A Homeyer et al ldquoMedical image

analysisrdquo IEEE Pulse vol 2 no 6 pp 60ndash70 2011[33] C P Botha ldquoDeVIDE The delft visualisation and image proc-

essing development environmentrdquo Tech Rep Delft TechnicalUniversity 2004 httpsgraphicstudelftnlPublications-new2004BO04a

[34] R E Gabr G B Tefera W J Allen A S Pednekar and P ANarayana ldquoErratum to GRAPE a graphical pipeline environ-ment for image analysis in adaptive magnetic resonance imag-ingrdquo International Journal for Computer Assisted Radiology andSurgery vol 12 no 3 pp 459-459 2017

[35] A E Szalo A Zehner and C Palm ldquoGraphMICrdquo in Bildverar-beitung fur die Medizin 2015 Informatik aktuell pp 395ndash400Springer Berlin Heidelberg Berlin Heidelberg 2015

[36] W Schroeder K Martin and B Lorensen The VisualizationToolkit Kitware Inc 3rd edition 2004 httpwwwworldcatorgisbn1930934122

[37] I Wolf M Vetter I Wegner et al ldquoThe medical imaginginteraction toolkitrdquo Medical Image Analysis vol 9 no 6 pp594ndash604 2005

[38] M Nolden S Zelzer A Seitel et al ldquoThe medical imag-ing interaction toolkit Challenges and advances 10 years ofopen-source developmentrdquo International Journal for ComputerAssisted Radiology and Surgery vol 8 no 4 pp 607ndash620 2013

[39] W-K Jeong H Pfister and M Fatica ldquoMedical image process-ing using GPU-accelerated ITK image filtersrdquo GPU ComputingGems Emerald Edition pp 737ndash749 2011

[40] J Blanchette andM Summerfield C++ GUI Programming withQt 4 Prentice Hall 2006

[41] G van Rossum ldquoPython programming languagerdquo in Proceed-ings of the 2007 USENIX Annual Technical Conference J Chaseand S Seshan Eds Santa Clara CA USA 2007 httpswwwusenixorgpublicationsproceedingsf[0]=im group audience3A114

[42] A Polukhin Boost C++ Application Development CookbookPackt Publishing 2nd edition 2017

[43] B Dogdas D Stout A F Chatziioannou and R M LeahyldquoDigimouse a 3D whole body mouse atlas from CT and cryo-section datardquo Physics in Medicine and Biology vol 52 no 3 pp577ndash587 2007

[44] S Roettger httplgdvcsfaudeExternalvollib[45] P F Felzenszwalb and D P Huttenlocher ldquoDistance transforms

of sampled functionsrdquo Theory of Computing An Open AccessJournal vol 8 pp 415ndash428 2012

International Journal of Biomedical Imaging 9

[46] K E Batcher ldquoSorting networks and their applicationsrdquo inProceedings of the the April 30ndashMay 2 1968 spring joint com-puter conference p 307 Atlantic City New Jersey April 1968

[47] J Barczak OpenGL Is Broken The Burning Basis Vector [blog]2014 httpwwwjoshbarczakcomblogp=154

[48] T K V W Group Vulkan 1066 - a specification 2017 httpswwwkhronosorgvulkan

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 2: Instant Feedback Rapid Prototyping for GPU-Accelerated ...downloads.hindawi.com/journals/ijbi/2018/2046269.pdf · ResearchArticle Instant Feedback Rapid Prototyping for GPU-Accelerated

2 International Journal of Biomedical Imaging

computation is however not as ubiquitous as it could be asthe development of parallel algorithms can be prohibitivelydifficult especially for those not familiar with the differentprogramming model [10] Initial use of GPGPU techniquesinvolved interpreting data as textures and then performingtypical graphics operations on them such as blendingprojection or interpolation [11 12]

As GPGPU techniques matured dedicated libraries forGPU programming became common of which the mostwell-known examples are CUDA [10] and OpenCL [13]CUDA requires a separate compilation step and is thus inade-quate for rapid prototyping In contrast OpenGL [14] whichis typically used as a backend for graphics and rendering hashad support for a run-time compiled shading language (GLSL[15 16]) for a long time This means that the GLSL sourcecode is passed to the graphics driver which then dynamicallycompiles an appropriate binary representation for the plat-form OpenGL 43 released in 2014 was extended to supportcompute shaders which facilitate arbitrary computation onthe graphics card directly

We exploit this dynamism to enable interactive develop-ment in GPU-accelerated computing and data explorationwith medical image processing in mind Specifically we wantto be able to see results immediately even while editing thesource code This concept is used today for example forentertainment in the so-calledDemoscene [17ndash19]We expectthis approach to also enable intuition-driven development inthe domain of medical image analysis

2 Methods

We have created a computational framework which facili-tates scriptable rapid prototyping friendly GPU-acceleratedcomputing and rendering of medical data On the highestlevel of the user interface is a graph editor to control theunderlying graph-based processing pipeline where nodesperformoperations on data and edges between them indicatedata flow (Figure 7) Internally the framework consists ofa scene graph describing hierarchies of objects This is inprinciple the same architecture skeleton as used by real-timeor game engines [20] The architecture supports creatingmultiple scenes performing rendering and subsequentlyeither the display of the result or further processing Asan example a volume (3D) texture can be rendered fromdifferent perspectives via the integrated volume renderer intoa number of 2D textures Since data processing happens viaa flexible user-controlled data pipeline the resulting texturescan be further processed

Key features are very rapid prototyping and short iter-ation times which allow to obtain results quickly Sincelarge parts of the software are scripted and all of the scriptscan be changed and reloaded at any time many featuresincluding the user interface (UI) can be changed withoutrestarting the framework GPU computation is realized withcompute shaders which can be changed and reloaded inthe same way as the scripts Inputoutput file formats areautomatically detected and support for new formats canbe added via plugins Multiple windows and screens aresupported to maximize the usable space This feature also

Core scripts

Lua APIImGui

Node scriptsApplication

PluginCore library

PluginSDL

OpenGLGPU Driver SystemGPU abstraction System abstractionLua

Plugin API

Figure 1 Block diagram of the software SDL provides most ofthe platform-dependent functionality everything except the lowestsoftware layer is completely platform-independent

provides support for more advanced display configurationssuch as two-projector stereoscopic 3D setups or CAVEs [21]

Scripting is realized with Lua [22] driving applicationand pipeline logic UI and node functionality The rest of theapplication and library is implemented in C++ Aside fromLua the external libraries utilized include SDL [23] (for cross-platform support) OpenGL and Dear ImGui [24] (for theUI) A custom plugin interface is implemented to enable sup-port for extensions third-party file formats and additionalLua functions The GPU backend used is currently OpenGLversion 45 [25] OpenGL is a cross-platform graphics APIwhich combines rendering and GPU-accelerated computa-tion The implementation of the backend follows modernAZDO (approaching zero driver overhead [26]) principleswhere applicable Accelerated processing is performed viacompute shaders which are written in GLSL Due to thepipelined nature of OpenGL most of its operations are per-formed in the background while the main CPU can performother tasks [25] The core library also supports semi-auto-matic multithreading of CPU-bound tasks but since theheavy compute jobs are usually performed by the GPU mul-tithreading is rarely necessary Figure 1 shows an overview ofthe overall framework design

From a developerrsquos perspective the following functionsare provided Vertex and fragment shaders facilitate therendering tasks and compute shaders perform the computa-tional tasks Shader introspection is used to determineinputs outputs andparameters of the particular shadersThisinformation is used by theUI Textures are used for image anddata storage and can be one- to three-dimensional with 1ndash4channels using various internal formats (eg 8 bits to savespace 16 bits for high detail and float for HDR data) Texturefetches can optionally be customizedwith a swizzlemask thatis the order in which color components are sampled is user-controllable GPU Buffer objects provide support for arbi-trary unformatted memory using persistent and coherentmemory mapping Manipulating them from either the CPUor GPU side respectively is possible without special con-straints The following OpenGL extensions are automaticallyused when supported by the system ARB bindless textureARB gpu shader int64 ARB gpu shader5 memory info andvarious robustness extensions to recover from driver crashesARB robustness ARB robust buffer access behavior andARB create context robustness as provided by SDL Despitetheir benefits and simplicity OpenGL compute shaders arenot intended for very large datasets as their running time

International Journal of Biomedical Imaging 3

is usually limited by the graphics driver two seconds is thedefault for recent NVidia drivers on Windows If a computeshader has not finished within that time limit the shader isforcefully terminated On Windows the graphics driver isreset which usually causes program termination In orderto overcome this limitation large datasets are automaticallysplit into smaller tiles which are then processed individuallyacross multiple shader invocations

A simple custom file format to store up to three-dimensional image data is included It supports lossless com-pression via the ZStandard algorithm [27] and is optimizedfor fast loading and simplicity in order to keep the core cleanTwo optional standard plugins are provided The first oneuses ITK [28] to add support for the Bitmap JPEG GDCMDICOM GIPL MetaImage Nrrd TIFF PNG StimulateVTK Nifti Gipl andHDF5 [29] file formats A full list can befound in the ITKwiki [30]The second one uses the stb imagelibrary [31] and adds support for PNG Bitmap TGA HDRJPEG PSD (Photoshop) PNMPPM and GIF files The twoplugins are independent of each other and we considerespecially the latter as a good starting point for users tryingto implement their own plugins The API does not exposeimplementation details or the GPU backend and is version-compatible in both directions Details about the plugin APIcan be found in the supplement In order to facilitate scriptingand script debugging a built-in real-time data inspector isincluded which can be used to traverse any Lua object alongwith attached variables functions and classes making itpossible to preview values and data objects where supported

From a userrsquos perspective a number of nodes are alreadyincluded implementing the following filtersalgorithms cur-vature derivative (edge detection) distance transform vari-ous simple math operations (element-wise addition subtrac-tion multiplication division power function ie all func-tions supported by GLSL) minmaxaverage region filtermedian filter surface normal extraction 3D rarr 2D sliceextraction convolution (Gaussian blur) type conversionand thresholding Two nodes accept a custom GLSL codesnippet from the user enabling live programming The firstcompiles the entered code to a compute shader to processor generate arbitrary data The second node compiles toa fragment shader that generates an arbitrary 2D texturewith limited compatibility to Shadertoy [17] Other nodesthat do not perform computation can act as data sources orsinks although there is no clear separation between the tworoles Image or memory buffer loaders are pure sources Avolume renderer essentially transforms a 3D dataset into a2D image given a perspective A universal memory viewerwith included hex-editor (that works on CPU and GPUmemory) is useful to diagnose low-levelmemory layout prob-lems

The layered architecture allows even novice users todesign a processing pipeline visually and interact with theprovided widgets without any programming being requiredfrom the user For quick familiarization with the UI context-sensitive help descriptions and tooltips as well as a generalguide are displayedwhen appropriateThenovice user is onlylimited by their knowledge of what existing algorithms dohow to use them and how to combine them to perform a

higher-level task More advanced users can quickly developnew nodes using GLSL for the computation and Lua for theinterface thus computation is always GPU-accelerated andscripting facilitates the rapid development This also enablesusers to quickly write prototype code for specific use casesA node is implemented as a single Lua script optionallycontaining GLSL code Figure 5 shows a complete exampleThe supplement contains more information and examplesdetailing the node API and implementation of custom nodes

3 Results

31 User Interface Our UI is designed for fast iteration real-time interaction and parameter adjustment to provide asmuch visual feedback as possible and to clearly highlightuser errors when (and also how) they occur Data manipu-lation effectively happens by linking nodes together to forma directed acyclic graph (DAG) Color-coded connectorsprevent accidental type-incompatible connections It is alsonot possible to construct cycles Any attempt to perform theseerroneous graph constructions will lead to an immediatedisplay of an appropriate error message (Figure 6)

For developers writing their own nodes there are automa-tisms in place that attempt to predict inputoutput propertiesand how to invoke a compute shader for commonly usedscenarios simplifying the development even more Figure 5shows an example and the software documentation providesan even more detailed explanation If more customization isrequired almost all functionality of a node can be speciallyimplemented including the UI

32 Comparison to Existing Tools We have compared oursoftware package to MeVisLab [32] DeVIDE [33] GRAPE[34] GraphMIC [35] and FAST [4] While FAST is a stand-alone library for OpenCL-accelerated image manipulationand visualization the other packages are mainly focused ongraph-based image processing and all of these utilize ITKVTK [36] MITK [37 38] or a variation of these libraries toperform the computation and rendering tasks Consequentlythey share ITKrsquos main weakness that is the CPU-boundprocessing without (or with very limited) GPU acceleration[39] All tools except MeVisLab are provided as open sourcepackages

In comparison to our package DeVIDE requires detailedknowledge of ITK as it directly maps ITK functions to graphnodes A version of the package that does not require ITKexists but has limited functionality The DeVIDE UI is notusable intuitively as nodes use internal names and theirfunction is not always clear Connectors are neither labellednor color-coded and any input can be connected to anyoutputMismatched connectors or cycles cause an errorwhentrying to execute the graph Missing parameters for a nodeare not signaled until an attempt ismade to execute the graphuponwhich an error is shownWewere unable to do anythingmeaningful with DeVIDE since we were unable to identify avalid combination of nodes which when connected togetherwould produce output and not cause graph execution tofail The UI has no apparent preview functionality availableDeVIDE is scriptable in Python

4 International Journal of Biomedical Imaging

MeVisLab is a large well-established commercial soft-ware package intended for rapid prototyping and imagemanipulation applications It is rapid prototyping friendly inthe sense that results can be quickly obtained and previews atevery stage provide visual feedback However implementingany custom extension requires the MeVisLab SDK and aC++ compiler therefore the actual development of customextensions is not rapid prototyping friendly Constructing agraph containing a cycle resulted in a crash MeVisLab isscriptable in Python

GraphMIC is only available for macOS thus we decidednot to test it as we wanted to focus on platform-independentpackages From the documentation it seems to be very simi-lar to DeVIDE concept-wise but it should bemore rapid pro-totyping and user-friendly since it supports previews param-eters can be adjusted directly on the nodes and inputoutputconnectors are clearly labelledThis package is also scriptablein Python

GRAPE is similar to GraphMIC and DeVIDE and men-tioned for completeness

FAST is not a graph editor but a C++ library similarto ITK and VTK that aims to cover similar use cases Ituses OpenCL to accelerate image operations and OpenGL torender results There is no scripting or UI and it is not veryrapid prototyping friendly in the current form since it targetsusage by C++ programmers only

In conclusion all of the listed graph-based tools are basedon ITK and VTK Other common dependencies are Qt [40]Python [41] and boost [42] These libraries are very large andcan be a hassle to build or to get working properly In contrastour package only depends on a single external library SDLwhich is small and easy to build on many operating systemsWe have tested our package on Windows and Linux and assoon as a working OpenGL 45 driver for macOS is availableit will be supported as well Therefore we expect that incomparison to similar packages our solution will be theeasiest to deploy as long as the systemrsquos graphics driver is ableto support at least OpenGL 45

33 Usage Examples The test system used for our bench-marking efforts was a consumer notebook with an IntelCore i7-4790S CPU 32GHz (4 cores 8 threads) anda NVIDIA GeForce GTX 965M graphics card with 4GBmemory operating under Windows 81 The software wascompiled using Visual Studio 2015 Update 3 All pipelinesshown below are included in the release package as examplesWe do not include MRI source data due to data protectionGood sources for initial test data are the Digimouse [43] andthe 119881and3 [44] datasets Figure 2 is an example for some of therendering modes possible with the built-in volume renderernode

34 Distance Transform We have performed a benchmarktest comparing the performance of a 3D distance transformimplemented in multithreaded C++ and GLSL respectivelySpecifically we chose the fast distance transform methodfrom [45] because the algorithm is not a pointwise operation(and therefore not trivially GPU-parallelizable) and needs aninitial preparation pass plus one pass for each axis (4 in total)

Figure 2 Selection of different renderings of CT data produced bythe included volume renderer node and some supporting nodesFrom left to right solid with curvature as color solid rescaled withcurvature as color edgestep function solid rescaled translucent

Thedistance transform can be used for further processing Anexample is given in Figure 3

Given the result in Table 1 we conclude that our GPUvariant is about two orders ofmagnitude faster than theCPU-based 3D distance transform for this specific computationenabling almost interactive operation (eg changing thethreshold and observing results) Figure 3 is an example foran extended use case a custom compute node executes asnippet of GLSL code which utilizes the distance transformalgorithm to remove most of a human skull The entiresnippet runs in less than 4ms for a 256 times 256 times 176 volumeWhile the rendered result is not perfect and a bit noisythe approach was developed within few minutes on thefly including the parameters used for the transformationExcluding the distance transform (which has to be computedonly once) but including volume rendering of the result instereo 3D the pipeline takes less than 20ms to execute

35 Segmentation A typical operation for medical imageanalysis is tissue segmentation We have developed a simpleproof-of-concept segmentation pipeline that extracts a spe-cific intensity range followed by a smoothing and thresholdoperation to form a mask This mask is then used tosegment skin and skull from the remaining tissue Anotherquickly developed code snippet performs the remainingsegmentation and contrast enhancement For visualizationa slice is extracted (Figure 4) Note that the segmentation wasperformed on the complete 256times256times176 volume instead ofa single slice onlyThewhole pipeline executes in about 161msfor this volume of which the final segmentation snippet took4ms An alternative pipeline that processes the same volumein 22ms is also included in the examples

36 Other Benchmarks We have included two more exam-ples for point-wise operations into the benchmarks in Table 1median filtering and intensity rescaling Intensity rescaling isa two-step operation First minimal and maximal intensityvalue are determined and then each pixel value is scaledaccordingly so that the output values are in [0 sdot sdot sdot 1] Simplepoint-wise operations like this can benefit even more fromGPU acceleration such as Gaussian smoothing arbitraryconvolutions edge detection thresholding and filteringThis kind of relatively simple operation can finish in a fewmilliseconds (ms) for typical input sizes (eg a volume ofsize 5123 voxels) allowing fully interactive use and parameter

International Journal of Biomedical Imaging 5

Figure 3The distance transform from Table 1 combined with a custom compute node running the GLSL snippet on the right is a simple wayto remove most of the skull in a head MRI scan The result is redcyan stereo-rendered

(a) (b)

Figure 4 Slice of a brain MRI scan (a) is normalized but otherwise unprocessed (b) is segmented into skullskin (teal) cerebrospinal fluid(red) and brain (grey) tissue The brain tissue is contrast-enhanced

Table 1 Timings for selected operations on 3 different volumes The speedup factor between CPU and GPU is calculated as the lowest CPUtime (8 threads) divided by the GPU time The 3D Euclidian distance transform operates on normalized (pixel values in [0 sdot sdot sdot 1]) volumeswith a solidity threshold of 05 The CPU implementation is our own including the multithreading Intensity rescaling and median filteringuse ITKrsquos CPU-based implementation The median filter takes the direct neighborhood of each voxel into account that is a box of 27 voxelsin total

Volume CPU (1 thread) CPU (8 threads) GPU Speedup 1 versus 8 threads Speedup CPU versus GPUDistance transform

256 times 256 times 176 1345 s 459 s 630ms 29x 73x512 times 512 times 19 632 s 203 s 217ms 31x 93x380 times 992 times 208 1631 s 627 s 4900ms 26x 128x

Rescale intensity to [0 sdot sdot sdot 1]256 times 256 times 176 88ms 68ms 101ms 129x 67x512 times 512 times 19 38ms 30ms 72ms 127x 417x380 times 992 times 208 603ms 457ms 975ms 131x 47x

Median filter256 times 256 times 176 4155ms 1544ms 97ms 27x 159x512 times 512 times 19 2120ms 671ms 48ms 315x 14x380 times 992 times 208 14 s 4800ms 715ms 29x 67x

6 International Journal of Biomedical Imaging

Figure 5 A minimal working example This node calculates (V +add)exp for every pixel and color channel in a 2D image resultingin contrast enhancement The only Lua code is the definitiontable in the last line the rest is GLSL embedded in a Lua stringMissing interface functions are automatically inducedThe resultinggraphical representation is displayed in the small boxnode (darkbackground green title bar) More information can be found in thesupplement (available here)

adjustment in real-time (Figures 4 and 5) Median filteringon the GPU is costly due to the sorting involved sorting isimplemented using a parallel sorting network [46]

4 Discussion

We expect that our framework will make GPU-accelerateddata processing more accessible While the examples givenso far mainly target use cases in the medical domain ourmethod is not limited to these and can be used formany kindsof 3D data processing that involve images or parallelizableoperations on a block of data The motivation to create aself-contained framework for rapid prototyping and built-in visualization came from difficulties when developingcertain methods to operate on CTMRI data The intendedalgorithms were too costly to execute on CPUs in anyreasonable time so GPU acceleration was a necessity Manytime-consuming problems during development could havebeen avoided if not only the inspection of data had beensupported visually but also the in-memory representationhad been easily accessible Our prototype of the pipelinewas hardcoded in C++ using plain OpenGL [47] with-out rapid prototyping functionality or automatic memorymanagement thus changing parameters or rewiring pipelineconnections required recompiling and rerunning the wholeprogram often causing problems due to technical oversightsand remaining bugs Our current framework attempts tosolve these problems It accelerates the once time-consumingpart of pipeline development so that the implementationcycles for new features are much shorter and also much moreefficient Regarding the implementation we intentionally rely

solely onOpenGL forGPUacceleration because it is themostcompatible complete and platform-independent graphicsand compute API currently available Other options eitherare vendor-locked (CUDA) have no graphics capabilities(OpenCL) or are exclusive to an operating system family(DirectX Metal) OpenGL does not come without problemshowever GPU drivers are mostly proprietary each imple-menting a different interpretation of the OpenGL specifi-cation sometimes exposing implementation differences andbugs [4 47] This may affect the core implementation there-fore care has been taken to adhere to the OpenGL 45 specifi-cation but some drivers may still cause problems Practicallythismay also cause user-writtenGLSL shaders towork fine ona specific setup but fail to compile or misbehave on anotherif the respective driverrsquos GLSL compilers exhibit differences

We also chose deliberately to not only enable but alsoenforce all node computations to be performed by the GPUWhile this rules out incorporating popular libraries suchas ITK and OpenCV we believe that this is the only wayto ensure a consistently fast pipeline without unnecessarilyslow legacy components The current focususe case of theapplication is novel development instead of reusing existingcomponents in a new package This is not expected to be aproblem for most users as this is what sets our frameworkapart from others

Given the ability to type in code at run-time andinspecting results immediately intuition-guided explorationof data is much easier and more interactive in comparisonto existing solutions as they offer mostly premade buildingblocksThe combination of rapid prototyping throughout ourentire implementation while at the same time also GPU-accelerating all calculations allows a user to process largerdatasets faster than with any of the other tools as these wereapparently not built with this kind of dynamism in mind

A selection of algorithms is included in the first publicrelease They might not be sufficient to cover all use casestherefore the use for medical research is not yet the primaryfocus of our package However even at this early stage ofdevelopment our software is useful for teaching and exten-sion for specific tasks by anyone

41 Future Work In its current state the software is a stand-alone application rather than a library Future functionalitymay include a program exporter to design entire processingpipelines inside the user interface and then export a singlescript that implements this dataflow graph This would beuseful for batch processing and inclusion in existing pipelinesand is expected to greatly simplify deployment for end usersWe also plan to include support for point clouds and regular3Dmeshes as they are strongly tied to graphics developmentand benefit greatly from GPU-accelerated processing Webelieve combining the ability to manipulate and visualizethese types of data in one package will be beneficial to otherrelated fieldsMoving away fromOpenGL is not a plan for theimmediate future but a long-term goal Switching to Vulkan[48] as a backend would not only enable multithreadedmulti-GPU computation but also enable support for mobiledevices (eg Android-based tablets) and hopefully minimizedriver-specific behavior when compared to OpenGL

International Journal of Biomedical Imaging 7

(a) (b)

Figure 6 Comparison of our graph editor (b) with the interface of DeVIDE (a) Our graph editor shows a lot more detail including a previewwhen hovering in-outputs if possible It also checks formisuse that is ensures connector type compatibility and prevents cyclesThe depictedDeVIDE graph did not work despite trying multiple variants There was no sign of error when building the graph

Figure 7 A dataflow graph to visualize a mouse CT volume Both ldquo3D Volumerdquo nodes render a volume texture into a 2D image givenindividual settings In this example the resulting 2D image is set to 4K resolution (3840 times 2160) An estimate of the time in milliseconds thata node spent computing on the GPU is displayed on each node respectively Any parameterimage changes propagate downstream so thatthe whole graph updates itself as necessary

Data Availability

Program and source code are accessible at httpsbitbucketorgmaxmalekxcv The mouse volume used in some of theexamples was converted from theDigimouse dataset [43]Thehuman head MRI data set was donated under the conditionof anonymity and cannot be published

Conflicts of Interest

There are no conflicts of interest financial or otherwiseassociated with this paper

Acknowledgments

Thisworkwas supported byTUGrazOpenAccess PublishingFund

Supplementary Materials

The supplement is a quick-start and developerrsquos guide for oursoftware xcv It describes how to extend the program relatedAPIs and good GLSL practices to maximize compatibilityacross graphics drivers The supplement also provides a

hands-on example for the implementation of a new nodetype (Supplementary Materials)

References

[1] D Fortmeier Direct volume rendering methods for needle inser-tion simulation [PhD Dissertation] University of LubeckGermany 2016 httpwwwzhbuni-luebeckdeepubsediss1748pdf

[2] J Zhou X Wang H Cui et al ldquoTopology-aware illuminationdesign for volume renderingrdquo BMC Bioinformatics vol 17 no1 2016

[3] R Srinivasan and S Fang ldquoIntegrating volume morphing andvisualizationrdquoComputational Geometry vol 15 no 1-3 pp 149ndash159 2000

[4] E Smistad M Bozorgi and F Lindseth ldquoFAST frameworkfor heterogeneousmedical image computing and visualizationrdquoInternational Journal for Computer Assisted Radiology andSurgery vol 10 no 11 pp 1811ndash1822 2015

[5] M A Akhloufi F Gariepy and G Champagne ldquoGPGPU real-time texture analysis frameworkrdquo in Proceedings of the Con-ference on Parallel Processing for Imaging Applications 2011 JD Owens I Lin Y Zhang and G B Beretta Eds vol 7872pp 7872-7872 San Francisco Airport CA USA 2011

8 International Journal of Biomedical Imaging

[6] R P Broussard and R W Ives ldquoUsing a commercial graphicalprocessing unit and theCUDAprogramming language to accel-erate scientific image processing applicationsrdquo in Proceedings ofthe Conference on Parallel Processing for Imaging Applications2011 J D Owens I Lin Y Zhang and G B Beretta Eds SPIEProceedings San Francisco Airport California USA

[7] A Eklund M Andersson and H Knutsson ldquofMRI analysis onthe GPUmdashPossibilities and challengesrdquo Computer Methods andPrograms in Biomedicine vol 105 no 2 pp 145ndash161 2012

[8] F Dachille K Kreeger B Chen I Bitter and A E KaufmanldquoHigh-quality volume rendering using texture mapping hard-warerdquo in Proceedings of the 1998 ACM SIGGRAPHEURO-GRAPHICS Workshop on Graphics Hardware A E KaufmanW Straszliger G Knittel H Pfister and S N Spencer Eds vol 31Lisbon Portugal 1998

[9] M D Hanwell K M Martin A Chaudhary and L S AvilaldquoThe Visualization Toolkit (VTK) Rewriting the renderingcode for modern graphics cardsrdquo SoftwareX vol 1-2 pp 9ndash122015

[10] J Sanders and E Kandrot CUDA by example an introductionto general purpose GPU programming Addison-Wesley 2011httpwwwworldcatorgoclc699702402

[11] R Yang andGWelch ldquoFast image segmentation and smoothingusing commodity graphics hardwarerdquo Journal of Graphics (GPUamp Game) Tools vol 7 no 4 pp 91ndash100 2002

[12] F D Igual R Mayo T D R Hartley U V Catalyurek A RuizandM Ujaldon ldquoExploring the GPU for enhancing parallelismon color and texture analysisrdquo in Proceedings of the in ParallelComputing FromMulticores and GPUs to Petascale Proceedingsof the conference ParCo B M Chapman F Desprez G RJoubert A Lichnewsky F J Peters and T Priol Eds vol 19pp 299ndash306 IOS Press 2009

[13] KOWGroupThe opencl specification version 22 2017 httpswwwkhronosorgopencl

[14] P Cozzi and C Riccio OpenGL Insights CRC Press 2012httpwwwopenglinsightscom

[15] M Bailey and S Cunningham ldquoComputer graphics shadersusing OpenGL 4Xrdquo in Proceedings of the ACM SIGGRAPHASIA 2010 Courses pp 1ndash173 Seoul Republic of Korea Decem-ber 2010

[16] M Bailey and S Cunningham Eds Graphics Shaders Theoryand Practice Taylor amp Francis 2nd edition 2011

[17] P Jeremias and I Quilez ldquoShadertoy Learn to create everythingin a fragment shaderrdquo in Proceedings of the SIGGRAPH Asia2014 Courses pp 1ndash15 New York NY USA December 2014

[18] M Reunanen Times of change in the demoscene A creative com-munity and its relationship with technology [PhD dissertation]University of Turku 2017

[19] D Hartmann Digital Art Natives Praktiken Artefakte undStrukturen der Computer-Demoszene Kulturverlag KadmosBerlin httpdigitalartnativesde

[20] H Marin-Vega G Alor-Hernandez R Zatarain-Cabada M LBarron-Estrada and J L Garcıa-Alcaraz ldquoA Brief Review ofGame Engines for Educational and Serious Games Develop-mentrdquo Journal of Information Technology Research vol 10 no4 pp 1ndash22 2017

[21] A L Turinsky E Fanea Q Trinh et al ldquoCAVEman Standard-ized anatomical context for biomedical data mappingrdquo Anato-mical Sciences Education vol 1 no 1 pp 10ndash18 2008

[22] R Ierusalimschy L H de Figueiredo and W C Filho ldquoLua-anextensible extension languagerdquo Software Practice and Experi-ence vol 26 no 6 pp 635ndash652 1996

[23] S Lantinga Simple DirectMedia Layer 2017 httplibsdlorg[24] O Cornut httpsgithubcomocornutimgui[25] K OW GroupThe opengl graphics system A specification (ver-

sion 45 (core profile) httpswwwkhronosorgregistryOpenGLspecsglglspec45corepdf

[26] C Everitt G Sellers J McDonald and T Foley httpswwwslidesharenetCassEverittapproaching-zero-driver-overhead

[27] Y Collet Zstandard 2015 httpzstdnet[28] H J Johnson M McCormick L Ibanez and T I S Con-

sortium The ITK Software Guide Kitware Inc 3rd editionhttpsitkorgItkSoftwareGuidepdf

[29] The HDF Group (1997-2017) Hierarchical data format version5 httpwwwhdfgrouporgHDF5

[30] httpsitkorgWikiITKFile Formats[31] S Barrett httpsgithubcomnothingsstb[32] F Ritter T Boskamp A Homeyer et al ldquoMedical image

analysisrdquo IEEE Pulse vol 2 no 6 pp 60ndash70 2011[33] C P Botha ldquoDeVIDE The delft visualisation and image proc-

essing development environmentrdquo Tech Rep Delft TechnicalUniversity 2004 httpsgraphicstudelftnlPublications-new2004BO04a

[34] R E Gabr G B Tefera W J Allen A S Pednekar and P ANarayana ldquoErratum to GRAPE a graphical pipeline environ-ment for image analysis in adaptive magnetic resonance imag-ingrdquo International Journal for Computer Assisted Radiology andSurgery vol 12 no 3 pp 459-459 2017

[35] A E Szalo A Zehner and C Palm ldquoGraphMICrdquo in Bildverar-beitung fur die Medizin 2015 Informatik aktuell pp 395ndash400Springer Berlin Heidelberg Berlin Heidelberg 2015

[36] W Schroeder K Martin and B Lorensen The VisualizationToolkit Kitware Inc 3rd edition 2004 httpwwwworldcatorgisbn1930934122

[37] I Wolf M Vetter I Wegner et al ldquoThe medical imaginginteraction toolkitrdquo Medical Image Analysis vol 9 no 6 pp594ndash604 2005

[38] M Nolden S Zelzer A Seitel et al ldquoThe medical imag-ing interaction toolkit Challenges and advances 10 years ofopen-source developmentrdquo International Journal for ComputerAssisted Radiology and Surgery vol 8 no 4 pp 607ndash620 2013

[39] W-K Jeong H Pfister and M Fatica ldquoMedical image process-ing using GPU-accelerated ITK image filtersrdquo GPU ComputingGems Emerald Edition pp 737ndash749 2011

[40] J Blanchette andM Summerfield C++ GUI Programming withQt 4 Prentice Hall 2006

[41] G van Rossum ldquoPython programming languagerdquo in Proceed-ings of the 2007 USENIX Annual Technical Conference J Chaseand S Seshan Eds Santa Clara CA USA 2007 httpswwwusenixorgpublicationsproceedingsf[0]=im group audience3A114

[42] A Polukhin Boost C++ Application Development CookbookPackt Publishing 2nd edition 2017

[43] B Dogdas D Stout A F Chatziioannou and R M LeahyldquoDigimouse a 3D whole body mouse atlas from CT and cryo-section datardquo Physics in Medicine and Biology vol 52 no 3 pp577ndash587 2007

[44] S Roettger httplgdvcsfaudeExternalvollib[45] P F Felzenszwalb and D P Huttenlocher ldquoDistance transforms

of sampled functionsrdquo Theory of Computing An Open AccessJournal vol 8 pp 415ndash428 2012

International Journal of Biomedical Imaging 9

[46] K E Batcher ldquoSorting networks and their applicationsrdquo inProceedings of the the April 30ndashMay 2 1968 spring joint com-puter conference p 307 Atlantic City New Jersey April 1968

[47] J Barczak OpenGL Is Broken The Burning Basis Vector [blog]2014 httpwwwjoshbarczakcomblogp=154

[48] T K V W Group Vulkan 1066 - a specification 2017 httpswwwkhronosorgvulkan

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 3: Instant Feedback Rapid Prototyping for GPU-Accelerated ...downloads.hindawi.com/journals/ijbi/2018/2046269.pdf · ResearchArticle Instant Feedback Rapid Prototyping for GPU-Accelerated

International Journal of Biomedical Imaging 3

is usually limited by the graphics driver two seconds is thedefault for recent NVidia drivers on Windows If a computeshader has not finished within that time limit the shader isforcefully terminated On Windows the graphics driver isreset which usually causes program termination In orderto overcome this limitation large datasets are automaticallysplit into smaller tiles which are then processed individuallyacross multiple shader invocations

A simple custom file format to store up to three-dimensional image data is included It supports lossless com-pression via the ZStandard algorithm [27] and is optimizedfor fast loading and simplicity in order to keep the core cleanTwo optional standard plugins are provided The first oneuses ITK [28] to add support for the Bitmap JPEG GDCMDICOM GIPL MetaImage Nrrd TIFF PNG StimulateVTK Nifti Gipl andHDF5 [29] file formats A full list can befound in the ITKwiki [30]The second one uses the stb imagelibrary [31] and adds support for PNG Bitmap TGA HDRJPEG PSD (Photoshop) PNMPPM and GIF files The twoplugins are independent of each other and we considerespecially the latter as a good starting point for users tryingto implement their own plugins The API does not exposeimplementation details or the GPU backend and is version-compatible in both directions Details about the plugin APIcan be found in the supplement In order to facilitate scriptingand script debugging a built-in real-time data inspector isincluded which can be used to traverse any Lua object alongwith attached variables functions and classes making itpossible to preview values and data objects where supported

From a userrsquos perspective a number of nodes are alreadyincluded implementing the following filtersalgorithms cur-vature derivative (edge detection) distance transform vari-ous simple math operations (element-wise addition subtrac-tion multiplication division power function ie all func-tions supported by GLSL) minmaxaverage region filtermedian filter surface normal extraction 3D rarr 2D sliceextraction convolution (Gaussian blur) type conversionand thresholding Two nodes accept a custom GLSL codesnippet from the user enabling live programming The firstcompiles the entered code to a compute shader to processor generate arbitrary data The second node compiles toa fragment shader that generates an arbitrary 2D texturewith limited compatibility to Shadertoy [17] Other nodesthat do not perform computation can act as data sources orsinks although there is no clear separation between the tworoles Image or memory buffer loaders are pure sources Avolume renderer essentially transforms a 3D dataset into a2D image given a perspective A universal memory viewerwith included hex-editor (that works on CPU and GPUmemory) is useful to diagnose low-levelmemory layout prob-lems

The layered architecture allows even novice users todesign a processing pipeline visually and interact with theprovided widgets without any programming being requiredfrom the user For quick familiarization with the UI context-sensitive help descriptions and tooltips as well as a generalguide are displayedwhen appropriateThenovice user is onlylimited by their knowledge of what existing algorithms dohow to use them and how to combine them to perform a

higher-level task More advanced users can quickly developnew nodes using GLSL for the computation and Lua for theinterface thus computation is always GPU-accelerated andscripting facilitates the rapid development This also enablesusers to quickly write prototype code for specific use casesA node is implemented as a single Lua script optionallycontaining GLSL code Figure 5 shows a complete exampleThe supplement contains more information and examplesdetailing the node API and implementation of custom nodes

3 Results

31 User Interface Our UI is designed for fast iteration real-time interaction and parameter adjustment to provide asmuch visual feedback as possible and to clearly highlightuser errors when (and also how) they occur Data manipu-lation effectively happens by linking nodes together to forma directed acyclic graph (DAG) Color-coded connectorsprevent accidental type-incompatible connections It is alsonot possible to construct cycles Any attempt to perform theseerroneous graph constructions will lead to an immediatedisplay of an appropriate error message (Figure 6)

For developers writing their own nodes there are automa-tisms in place that attempt to predict inputoutput propertiesand how to invoke a compute shader for commonly usedscenarios simplifying the development even more Figure 5shows an example and the software documentation providesan even more detailed explanation If more customization isrequired almost all functionality of a node can be speciallyimplemented including the UI

32 Comparison to Existing Tools We have compared oursoftware package to MeVisLab [32] DeVIDE [33] GRAPE[34] GraphMIC [35] and FAST [4] While FAST is a stand-alone library for OpenCL-accelerated image manipulationand visualization the other packages are mainly focused ongraph-based image processing and all of these utilize ITKVTK [36] MITK [37 38] or a variation of these libraries toperform the computation and rendering tasks Consequentlythey share ITKrsquos main weakness that is the CPU-boundprocessing without (or with very limited) GPU acceleration[39] All tools except MeVisLab are provided as open sourcepackages

In comparison to our package DeVIDE requires detailedknowledge of ITK as it directly maps ITK functions to graphnodes A version of the package that does not require ITKexists but has limited functionality The DeVIDE UI is notusable intuitively as nodes use internal names and theirfunction is not always clear Connectors are neither labellednor color-coded and any input can be connected to anyoutputMismatched connectors or cycles cause an errorwhentrying to execute the graph Missing parameters for a nodeare not signaled until an attempt ismade to execute the graphuponwhich an error is shownWewere unable to do anythingmeaningful with DeVIDE since we were unable to identify avalid combination of nodes which when connected togetherwould produce output and not cause graph execution tofail The UI has no apparent preview functionality availableDeVIDE is scriptable in Python

4 International Journal of Biomedical Imaging

MeVisLab is a large well-established commercial soft-ware package intended for rapid prototyping and imagemanipulation applications It is rapid prototyping friendly inthe sense that results can be quickly obtained and previews atevery stage provide visual feedback However implementingany custom extension requires the MeVisLab SDK and aC++ compiler therefore the actual development of customextensions is not rapid prototyping friendly Constructing agraph containing a cycle resulted in a crash MeVisLab isscriptable in Python

GraphMIC is only available for macOS thus we decidednot to test it as we wanted to focus on platform-independentpackages From the documentation it seems to be very simi-lar to DeVIDE concept-wise but it should bemore rapid pro-totyping and user-friendly since it supports previews param-eters can be adjusted directly on the nodes and inputoutputconnectors are clearly labelledThis package is also scriptablein Python

GRAPE is similar to GraphMIC and DeVIDE and men-tioned for completeness

FAST is not a graph editor but a C++ library similarto ITK and VTK that aims to cover similar use cases Ituses OpenCL to accelerate image operations and OpenGL torender results There is no scripting or UI and it is not veryrapid prototyping friendly in the current form since it targetsusage by C++ programmers only

In conclusion all of the listed graph-based tools are basedon ITK and VTK Other common dependencies are Qt [40]Python [41] and boost [42] These libraries are very large andcan be a hassle to build or to get working properly In contrastour package only depends on a single external library SDLwhich is small and easy to build on many operating systemsWe have tested our package on Windows and Linux and assoon as a working OpenGL 45 driver for macOS is availableit will be supported as well Therefore we expect that incomparison to similar packages our solution will be theeasiest to deploy as long as the systemrsquos graphics driver is ableto support at least OpenGL 45

33 Usage Examples The test system used for our bench-marking efforts was a consumer notebook with an IntelCore i7-4790S CPU 32GHz (4 cores 8 threads) anda NVIDIA GeForce GTX 965M graphics card with 4GBmemory operating under Windows 81 The software wascompiled using Visual Studio 2015 Update 3 All pipelinesshown below are included in the release package as examplesWe do not include MRI source data due to data protectionGood sources for initial test data are the Digimouse [43] andthe 119881and3 [44] datasets Figure 2 is an example for some of therendering modes possible with the built-in volume renderernode

34 Distance Transform We have performed a benchmarktest comparing the performance of a 3D distance transformimplemented in multithreaded C++ and GLSL respectivelySpecifically we chose the fast distance transform methodfrom [45] because the algorithm is not a pointwise operation(and therefore not trivially GPU-parallelizable) and needs aninitial preparation pass plus one pass for each axis (4 in total)

Figure 2 Selection of different renderings of CT data produced bythe included volume renderer node and some supporting nodesFrom left to right solid with curvature as color solid rescaled withcurvature as color edgestep function solid rescaled translucent

Thedistance transform can be used for further processing Anexample is given in Figure 3

Given the result in Table 1 we conclude that our GPUvariant is about two orders ofmagnitude faster than theCPU-based 3D distance transform for this specific computationenabling almost interactive operation (eg changing thethreshold and observing results) Figure 3 is an example foran extended use case a custom compute node executes asnippet of GLSL code which utilizes the distance transformalgorithm to remove most of a human skull The entiresnippet runs in less than 4ms for a 256 times 256 times 176 volumeWhile the rendered result is not perfect and a bit noisythe approach was developed within few minutes on thefly including the parameters used for the transformationExcluding the distance transform (which has to be computedonly once) but including volume rendering of the result instereo 3D the pipeline takes less than 20ms to execute

35 Segmentation A typical operation for medical imageanalysis is tissue segmentation We have developed a simpleproof-of-concept segmentation pipeline that extracts a spe-cific intensity range followed by a smoothing and thresholdoperation to form a mask This mask is then used tosegment skin and skull from the remaining tissue Anotherquickly developed code snippet performs the remainingsegmentation and contrast enhancement For visualizationa slice is extracted (Figure 4) Note that the segmentation wasperformed on the complete 256times256times176 volume instead ofa single slice onlyThewhole pipeline executes in about 161msfor this volume of which the final segmentation snippet took4ms An alternative pipeline that processes the same volumein 22ms is also included in the examples

36 Other Benchmarks We have included two more exam-ples for point-wise operations into the benchmarks in Table 1median filtering and intensity rescaling Intensity rescaling isa two-step operation First minimal and maximal intensityvalue are determined and then each pixel value is scaledaccordingly so that the output values are in [0 sdot sdot sdot 1] Simplepoint-wise operations like this can benefit even more fromGPU acceleration such as Gaussian smoothing arbitraryconvolutions edge detection thresholding and filteringThis kind of relatively simple operation can finish in a fewmilliseconds (ms) for typical input sizes (eg a volume ofsize 5123 voxels) allowing fully interactive use and parameter

International Journal of Biomedical Imaging 5

Figure 3The distance transform from Table 1 combined with a custom compute node running the GLSL snippet on the right is a simple wayto remove most of the skull in a head MRI scan The result is redcyan stereo-rendered

(a) (b)

Figure 4 Slice of a brain MRI scan (a) is normalized but otherwise unprocessed (b) is segmented into skullskin (teal) cerebrospinal fluid(red) and brain (grey) tissue The brain tissue is contrast-enhanced

Table 1 Timings for selected operations on 3 different volumes The speedup factor between CPU and GPU is calculated as the lowest CPUtime (8 threads) divided by the GPU time The 3D Euclidian distance transform operates on normalized (pixel values in [0 sdot sdot sdot 1]) volumeswith a solidity threshold of 05 The CPU implementation is our own including the multithreading Intensity rescaling and median filteringuse ITKrsquos CPU-based implementation The median filter takes the direct neighborhood of each voxel into account that is a box of 27 voxelsin total

Volume CPU (1 thread) CPU (8 threads) GPU Speedup 1 versus 8 threads Speedup CPU versus GPUDistance transform

256 times 256 times 176 1345 s 459 s 630ms 29x 73x512 times 512 times 19 632 s 203 s 217ms 31x 93x380 times 992 times 208 1631 s 627 s 4900ms 26x 128x

Rescale intensity to [0 sdot sdot sdot 1]256 times 256 times 176 88ms 68ms 101ms 129x 67x512 times 512 times 19 38ms 30ms 72ms 127x 417x380 times 992 times 208 603ms 457ms 975ms 131x 47x

Median filter256 times 256 times 176 4155ms 1544ms 97ms 27x 159x512 times 512 times 19 2120ms 671ms 48ms 315x 14x380 times 992 times 208 14 s 4800ms 715ms 29x 67x

6 International Journal of Biomedical Imaging

Figure 5 A minimal working example This node calculates (V +add)exp for every pixel and color channel in a 2D image resultingin contrast enhancement The only Lua code is the definitiontable in the last line the rest is GLSL embedded in a Lua stringMissing interface functions are automatically inducedThe resultinggraphical representation is displayed in the small boxnode (darkbackground green title bar) More information can be found in thesupplement (available here)

adjustment in real-time (Figures 4 and 5) Median filteringon the GPU is costly due to the sorting involved sorting isimplemented using a parallel sorting network [46]

4 Discussion

We expect that our framework will make GPU-accelerateddata processing more accessible While the examples givenso far mainly target use cases in the medical domain ourmethod is not limited to these and can be used formany kindsof 3D data processing that involve images or parallelizableoperations on a block of data The motivation to create aself-contained framework for rapid prototyping and built-in visualization came from difficulties when developingcertain methods to operate on CTMRI data The intendedalgorithms were too costly to execute on CPUs in anyreasonable time so GPU acceleration was a necessity Manytime-consuming problems during development could havebeen avoided if not only the inspection of data had beensupported visually but also the in-memory representationhad been easily accessible Our prototype of the pipelinewas hardcoded in C++ using plain OpenGL [47] with-out rapid prototyping functionality or automatic memorymanagement thus changing parameters or rewiring pipelineconnections required recompiling and rerunning the wholeprogram often causing problems due to technical oversightsand remaining bugs Our current framework attempts tosolve these problems It accelerates the once time-consumingpart of pipeline development so that the implementationcycles for new features are much shorter and also much moreefficient Regarding the implementation we intentionally rely

solely onOpenGL forGPUacceleration because it is themostcompatible complete and platform-independent graphicsand compute API currently available Other options eitherare vendor-locked (CUDA) have no graphics capabilities(OpenCL) or are exclusive to an operating system family(DirectX Metal) OpenGL does not come without problemshowever GPU drivers are mostly proprietary each imple-menting a different interpretation of the OpenGL specifi-cation sometimes exposing implementation differences andbugs [4 47] This may affect the core implementation there-fore care has been taken to adhere to the OpenGL 45 specifi-cation but some drivers may still cause problems Practicallythismay also cause user-writtenGLSL shaders towork fine ona specific setup but fail to compile or misbehave on anotherif the respective driverrsquos GLSL compilers exhibit differences

We also chose deliberately to not only enable but alsoenforce all node computations to be performed by the GPUWhile this rules out incorporating popular libraries suchas ITK and OpenCV we believe that this is the only wayto ensure a consistently fast pipeline without unnecessarilyslow legacy components The current focususe case of theapplication is novel development instead of reusing existingcomponents in a new package This is not expected to be aproblem for most users as this is what sets our frameworkapart from others

Given the ability to type in code at run-time andinspecting results immediately intuition-guided explorationof data is much easier and more interactive in comparisonto existing solutions as they offer mostly premade buildingblocksThe combination of rapid prototyping throughout ourentire implementation while at the same time also GPU-accelerating all calculations allows a user to process largerdatasets faster than with any of the other tools as these wereapparently not built with this kind of dynamism in mind

A selection of algorithms is included in the first publicrelease They might not be sufficient to cover all use casestherefore the use for medical research is not yet the primaryfocus of our package However even at this early stage ofdevelopment our software is useful for teaching and exten-sion for specific tasks by anyone

41 Future Work In its current state the software is a stand-alone application rather than a library Future functionalitymay include a program exporter to design entire processingpipelines inside the user interface and then export a singlescript that implements this dataflow graph This would beuseful for batch processing and inclusion in existing pipelinesand is expected to greatly simplify deployment for end usersWe also plan to include support for point clouds and regular3Dmeshes as they are strongly tied to graphics developmentand benefit greatly from GPU-accelerated processing Webelieve combining the ability to manipulate and visualizethese types of data in one package will be beneficial to otherrelated fieldsMoving away fromOpenGL is not a plan for theimmediate future but a long-term goal Switching to Vulkan[48] as a backend would not only enable multithreadedmulti-GPU computation but also enable support for mobiledevices (eg Android-based tablets) and hopefully minimizedriver-specific behavior when compared to OpenGL

International Journal of Biomedical Imaging 7

(a) (b)

Figure 6 Comparison of our graph editor (b) with the interface of DeVIDE (a) Our graph editor shows a lot more detail including a previewwhen hovering in-outputs if possible It also checks formisuse that is ensures connector type compatibility and prevents cyclesThe depictedDeVIDE graph did not work despite trying multiple variants There was no sign of error when building the graph

Figure 7 A dataflow graph to visualize a mouse CT volume Both ldquo3D Volumerdquo nodes render a volume texture into a 2D image givenindividual settings In this example the resulting 2D image is set to 4K resolution (3840 times 2160) An estimate of the time in milliseconds thata node spent computing on the GPU is displayed on each node respectively Any parameterimage changes propagate downstream so thatthe whole graph updates itself as necessary

Data Availability

Program and source code are accessible at httpsbitbucketorgmaxmalekxcv The mouse volume used in some of theexamples was converted from theDigimouse dataset [43]Thehuman head MRI data set was donated under the conditionof anonymity and cannot be published

Conflicts of Interest

There are no conflicts of interest financial or otherwiseassociated with this paper

Acknowledgments

Thisworkwas supported byTUGrazOpenAccess PublishingFund

Supplementary Materials

The supplement is a quick-start and developerrsquos guide for oursoftware xcv It describes how to extend the program relatedAPIs and good GLSL practices to maximize compatibilityacross graphics drivers The supplement also provides a

hands-on example for the implementation of a new nodetype (Supplementary Materials)

References

[1] D Fortmeier Direct volume rendering methods for needle inser-tion simulation [PhD Dissertation] University of LubeckGermany 2016 httpwwwzhbuni-luebeckdeepubsediss1748pdf

[2] J Zhou X Wang H Cui et al ldquoTopology-aware illuminationdesign for volume renderingrdquo BMC Bioinformatics vol 17 no1 2016

[3] R Srinivasan and S Fang ldquoIntegrating volume morphing andvisualizationrdquoComputational Geometry vol 15 no 1-3 pp 149ndash159 2000

[4] E Smistad M Bozorgi and F Lindseth ldquoFAST frameworkfor heterogeneousmedical image computing and visualizationrdquoInternational Journal for Computer Assisted Radiology andSurgery vol 10 no 11 pp 1811ndash1822 2015

[5] M A Akhloufi F Gariepy and G Champagne ldquoGPGPU real-time texture analysis frameworkrdquo in Proceedings of the Con-ference on Parallel Processing for Imaging Applications 2011 JD Owens I Lin Y Zhang and G B Beretta Eds vol 7872pp 7872-7872 San Francisco Airport CA USA 2011

8 International Journal of Biomedical Imaging

[6] R P Broussard and R W Ives ldquoUsing a commercial graphicalprocessing unit and theCUDAprogramming language to accel-erate scientific image processing applicationsrdquo in Proceedings ofthe Conference on Parallel Processing for Imaging Applications2011 J D Owens I Lin Y Zhang and G B Beretta Eds SPIEProceedings San Francisco Airport California USA

[7] A Eklund M Andersson and H Knutsson ldquofMRI analysis onthe GPUmdashPossibilities and challengesrdquo Computer Methods andPrograms in Biomedicine vol 105 no 2 pp 145ndash161 2012

[8] F Dachille K Kreeger B Chen I Bitter and A E KaufmanldquoHigh-quality volume rendering using texture mapping hard-warerdquo in Proceedings of the 1998 ACM SIGGRAPHEURO-GRAPHICS Workshop on Graphics Hardware A E KaufmanW Straszliger G Knittel H Pfister and S N Spencer Eds vol 31Lisbon Portugal 1998

[9] M D Hanwell K M Martin A Chaudhary and L S AvilaldquoThe Visualization Toolkit (VTK) Rewriting the renderingcode for modern graphics cardsrdquo SoftwareX vol 1-2 pp 9ndash122015

[10] J Sanders and E Kandrot CUDA by example an introductionto general purpose GPU programming Addison-Wesley 2011httpwwwworldcatorgoclc699702402

[11] R Yang andGWelch ldquoFast image segmentation and smoothingusing commodity graphics hardwarerdquo Journal of Graphics (GPUamp Game) Tools vol 7 no 4 pp 91ndash100 2002

[12] F D Igual R Mayo T D R Hartley U V Catalyurek A RuizandM Ujaldon ldquoExploring the GPU for enhancing parallelismon color and texture analysisrdquo in Proceedings of the in ParallelComputing FromMulticores and GPUs to Petascale Proceedingsof the conference ParCo B M Chapman F Desprez G RJoubert A Lichnewsky F J Peters and T Priol Eds vol 19pp 299ndash306 IOS Press 2009

[13] KOWGroupThe opencl specification version 22 2017 httpswwwkhronosorgopencl

[14] P Cozzi and C Riccio OpenGL Insights CRC Press 2012httpwwwopenglinsightscom

[15] M Bailey and S Cunningham ldquoComputer graphics shadersusing OpenGL 4Xrdquo in Proceedings of the ACM SIGGRAPHASIA 2010 Courses pp 1ndash173 Seoul Republic of Korea Decem-ber 2010

[16] M Bailey and S Cunningham Eds Graphics Shaders Theoryand Practice Taylor amp Francis 2nd edition 2011

[17] P Jeremias and I Quilez ldquoShadertoy Learn to create everythingin a fragment shaderrdquo in Proceedings of the SIGGRAPH Asia2014 Courses pp 1ndash15 New York NY USA December 2014

[18] M Reunanen Times of change in the demoscene A creative com-munity and its relationship with technology [PhD dissertation]University of Turku 2017

[19] D Hartmann Digital Art Natives Praktiken Artefakte undStrukturen der Computer-Demoszene Kulturverlag KadmosBerlin httpdigitalartnativesde

[20] H Marin-Vega G Alor-Hernandez R Zatarain-Cabada M LBarron-Estrada and J L Garcıa-Alcaraz ldquoA Brief Review ofGame Engines for Educational and Serious Games Develop-mentrdquo Journal of Information Technology Research vol 10 no4 pp 1ndash22 2017

[21] A L Turinsky E Fanea Q Trinh et al ldquoCAVEman Standard-ized anatomical context for biomedical data mappingrdquo Anato-mical Sciences Education vol 1 no 1 pp 10ndash18 2008

[22] R Ierusalimschy L H de Figueiredo and W C Filho ldquoLua-anextensible extension languagerdquo Software Practice and Experi-ence vol 26 no 6 pp 635ndash652 1996

[23] S Lantinga Simple DirectMedia Layer 2017 httplibsdlorg[24] O Cornut httpsgithubcomocornutimgui[25] K OW GroupThe opengl graphics system A specification (ver-

sion 45 (core profile) httpswwwkhronosorgregistryOpenGLspecsglglspec45corepdf

[26] C Everitt G Sellers J McDonald and T Foley httpswwwslidesharenetCassEverittapproaching-zero-driver-overhead

[27] Y Collet Zstandard 2015 httpzstdnet[28] H J Johnson M McCormick L Ibanez and T I S Con-

sortium The ITK Software Guide Kitware Inc 3rd editionhttpsitkorgItkSoftwareGuidepdf

[29] The HDF Group (1997-2017) Hierarchical data format version5 httpwwwhdfgrouporgHDF5

[30] httpsitkorgWikiITKFile Formats[31] S Barrett httpsgithubcomnothingsstb[32] F Ritter T Boskamp A Homeyer et al ldquoMedical image

analysisrdquo IEEE Pulse vol 2 no 6 pp 60ndash70 2011[33] C P Botha ldquoDeVIDE The delft visualisation and image proc-

essing development environmentrdquo Tech Rep Delft TechnicalUniversity 2004 httpsgraphicstudelftnlPublications-new2004BO04a

[34] R E Gabr G B Tefera W J Allen A S Pednekar and P ANarayana ldquoErratum to GRAPE a graphical pipeline environ-ment for image analysis in adaptive magnetic resonance imag-ingrdquo International Journal for Computer Assisted Radiology andSurgery vol 12 no 3 pp 459-459 2017

[35] A E Szalo A Zehner and C Palm ldquoGraphMICrdquo in Bildverar-beitung fur die Medizin 2015 Informatik aktuell pp 395ndash400Springer Berlin Heidelberg Berlin Heidelberg 2015

[36] W Schroeder K Martin and B Lorensen The VisualizationToolkit Kitware Inc 3rd edition 2004 httpwwwworldcatorgisbn1930934122

[37] I Wolf M Vetter I Wegner et al ldquoThe medical imaginginteraction toolkitrdquo Medical Image Analysis vol 9 no 6 pp594ndash604 2005

[38] M Nolden S Zelzer A Seitel et al ldquoThe medical imag-ing interaction toolkit Challenges and advances 10 years ofopen-source developmentrdquo International Journal for ComputerAssisted Radiology and Surgery vol 8 no 4 pp 607ndash620 2013

[39] W-K Jeong H Pfister and M Fatica ldquoMedical image process-ing using GPU-accelerated ITK image filtersrdquo GPU ComputingGems Emerald Edition pp 737ndash749 2011

[40] J Blanchette andM Summerfield C++ GUI Programming withQt 4 Prentice Hall 2006

[41] G van Rossum ldquoPython programming languagerdquo in Proceed-ings of the 2007 USENIX Annual Technical Conference J Chaseand S Seshan Eds Santa Clara CA USA 2007 httpswwwusenixorgpublicationsproceedingsf[0]=im group audience3A114

[42] A Polukhin Boost C++ Application Development CookbookPackt Publishing 2nd edition 2017

[43] B Dogdas D Stout A F Chatziioannou and R M LeahyldquoDigimouse a 3D whole body mouse atlas from CT and cryo-section datardquo Physics in Medicine and Biology vol 52 no 3 pp577ndash587 2007

[44] S Roettger httplgdvcsfaudeExternalvollib[45] P F Felzenszwalb and D P Huttenlocher ldquoDistance transforms

of sampled functionsrdquo Theory of Computing An Open AccessJournal vol 8 pp 415ndash428 2012

International Journal of Biomedical Imaging 9

[46] K E Batcher ldquoSorting networks and their applicationsrdquo inProceedings of the the April 30ndashMay 2 1968 spring joint com-puter conference p 307 Atlantic City New Jersey April 1968

[47] J Barczak OpenGL Is Broken The Burning Basis Vector [blog]2014 httpwwwjoshbarczakcomblogp=154

[48] T K V W Group Vulkan 1066 - a specification 2017 httpswwwkhronosorgvulkan

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 4: Instant Feedback Rapid Prototyping for GPU-Accelerated ...downloads.hindawi.com/journals/ijbi/2018/2046269.pdf · ResearchArticle Instant Feedback Rapid Prototyping for GPU-Accelerated

4 International Journal of Biomedical Imaging

MeVisLab is a large well-established commercial soft-ware package intended for rapid prototyping and imagemanipulation applications It is rapid prototyping friendly inthe sense that results can be quickly obtained and previews atevery stage provide visual feedback However implementingany custom extension requires the MeVisLab SDK and aC++ compiler therefore the actual development of customextensions is not rapid prototyping friendly Constructing agraph containing a cycle resulted in a crash MeVisLab isscriptable in Python

GraphMIC is only available for macOS thus we decidednot to test it as we wanted to focus on platform-independentpackages From the documentation it seems to be very simi-lar to DeVIDE concept-wise but it should bemore rapid pro-totyping and user-friendly since it supports previews param-eters can be adjusted directly on the nodes and inputoutputconnectors are clearly labelledThis package is also scriptablein Python

GRAPE is similar to GraphMIC and DeVIDE and men-tioned for completeness

FAST is not a graph editor but a C++ library similarto ITK and VTK that aims to cover similar use cases Ituses OpenCL to accelerate image operations and OpenGL torender results There is no scripting or UI and it is not veryrapid prototyping friendly in the current form since it targetsusage by C++ programmers only

In conclusion all of the listed graph-based tools are basedon ITK and VTK Other common dependencies are Qt [40]Python [41] and boost [42] These libraries are very large andcan be a hassle to build or to get working properly In contrastour package only depends on a single external library SDLwhich is small and easy to build on many operating systemsWe have tested our package on Windows and Linux and assoon as a working OpenGL 45 driver for macOS is availableit will be supported as well Therefore we expect that incomparison to similar packages our solution will be theeasiest to deploy as long as the systemrsquos graphics driver is ableto support at least OpenGL 45

33 Usage Examples The test system used for our bench-marking efforts was a consumer notebook with an IntelCore i7-4790S CPU 32GHz (4 cores 8 threads) anda NVIDIA GeForce GTX 965M graphics card with 4GBmemory operating under Windows 81 The software wascompiled using Visual Studio 2015 Update 3 All pipelinesshown below are included in the release package as examplesWe do not include MRI source data due to data protectionGood sources for initial test data are the Digimouse [43] andthe 119881and3 [44] datasets Figure 2 is an example for some of therendering modes possible with the built-in volume renderernode

34 Distance Transform We have performed a benchmarktest comparing the performance of a 3D distance transformimplemented in multithreaded C++ and GLSL respectivelySpecifically we chose the fast distance transform methodfrom [45] because the algorithm is not a pointwise operation(and therefore not trivially GPU-parallelizable) and needs aninitial preparation pass plus one pass for each axis (4 in total)

Figure 2 Selection of different renderings of CT data produced bythe included volume renderer node and some supporting nodesFrom left to right solid with curvature as color solid rescaled withcurvature as color edgestep function solid rescaled translucent

Thedistance transform can be used for further processing Anexample is given in Figure 3

Given the result in Table 1 we conclude that our GPUvariant is about two orders ofmagnitude faster than theCPU-based 3D distance transform for this specific computationenabling almost interactive operation (eg changing thethreshold and observing results) Figure 3 is an example foran extended use case a custom compute node executes asnippet of GLSL code which utilizes the distance transformalgorithm to remove most of a human skull The entiresnippet runs in less than 4ms for a 256 times 256 times 176 volumeWhile the rendered result is not perfect and a bit noisythe approach was developed within few minutes on thefly including the parameters used for the transformationExcluding the distance transform (which has to be computedonly once) but including volume rendering of the result instereo 3D the pipeline takes less than 20ms to execute

35 Segmentation A typical operation for medical imageanalysis is tissue segmentation We have developed a simpleproof-of-concept segmentation pipeline that extracts a spe-cific intensity range followed by a smoothing and thresholdoperation to form a mask This mask is then used tosegment skin and skull from the remaining tissue Anotherquickly developed code snippet performs the remainingsegmentation and contrast enhancement For visualizationa slice is extracted (Figure 4) Note that the segmentation wasperformed on the complete 256times256times176 volume instead ofa single slice onlyThewhole pipeline executes in about 161msfor this volume of which the final segmentation snippet took4ms An alternative pipeline that processes the same volumein 22ms is also included in the examples

36 Other Benchmarks We have included two more exam-ples for point-wise operations into the benchmarks in Table 1median filtering and intensity rescaling Intensity rescaling isa two-step operation First minimal and maximal intensityvalue are determined and then each pixel value is scaledaccordingly so that the output values are in [0 sdot sdot sdot 1] Simplepoint-wise operations like this can benefit even more fromGPU acceleration such as Gaussian smoothing arbitraryconvolutions edge detection thresholding and filteringThis kind of relatively simple operation can finish in a fewmilliseconds (ms) for typical input sizes (eg a volume ofsize 5123 voxels) allowing fully interactive use and parameter

International Journal of Biomedical Imaging 5

Figure 3The distance transform from Table 1 combined with a custom compute node running the GLSL snippet on the right is a simple wayto remove most of the skull in a head MRI scan The result is redcyan stereo-rendered

(a) (b)

Figure 4 Slice of a brain MRI scan (a) is normalized but otherwise unprocessed (b) is segmented into skullskin (teal) cerebrospinal fluid(red) and brain (grey) tissue The brain tissue is contrast-enhanced

Table 1 Timings for selected operations on 3 different volumes The speedup factor between CPU and GPU is calculated as the lowest CPUtime (8 threads) divided by the GPU time The 3D Euclidian distance transform operates on normalized (pixel values in [0 sdot sdot sdot 1]) volumeswith a solidity threshold of 05 The CPU implementation is our own including the multithreading Intensity rescaling and median filteringuse ITKrsquos CPU-based implementation The median filter takes the direct neighborhood of each voxel into account that is a box of 27 voxelsin total

Volume CPU (1 thread) CPU (8 threads) GPU Speedup 1 versus 8 threads Speedup CPU versus GPUDistance transform

256 times 256 times 176 1345 s 459 s 630ms 29x 73x512 times 512 times 19 632 s 203 s 217ms 31x 93x380 times 992 times 208 1631 s 627 s 4900ms 26x 128x

Rescale intensity to [0 sdot sdot sdot 1]256 times 256 times 176 88ms 68ms 101ms 129x 67x512 times 512 times 19 38ms 30ms 72ms 127x 417x380 times 992 times 208 603ms 457ms 975ms 131x 47x

Median filter256 times 256 times 176 4155ms 1544ms 97ms 27x 159x512 times 512 times 19 2120ms 671ms 48ms 315x 14x380 times 992 times 208 14 s 4800ms 715ms 29x 67x

6 International Journal of Biomedical Imaging

Figure 5 A minimal working example This node calculates (V +add)exp for every pixel and color channel in a 2D image resultingin contrast enhancement The only Lua code is the definitiontable in the last line the rest is GLSL embedded in a Lua stringMissing interface functions are automatically inducedThe resultinggraphical representation is displayed in the small boxnode (darkbackground green title bar) More information can be found in thesupplement (available here)

adjustment in real-time (Figures 4 and 5) Median filteringon the GPU is costly due to the sorting involved sorting isimplemented using a parallel sorting network [46]

4 Discussion

We expect that our framework will make GPU-accelerateddata processing more accessible While the examples givenso far mainly target use cases in the medical domain ourmethod is not limited to these and can be used formany kindsof 3D data processing that involve images or parallelizableoperations on a block of data The motivation to create aself-contained framework for rapid prototyping and built-in visualization came from difficulties when developingcertain methods to operate on CTMRI data The intendedalgorithms were too costly to execute on CPUs in anyreasonable time so GPU acceleration was a necessity Manytime-consuming problems during development could havebeen avoided if not only the inspection of data had beensupported visually but also the in-memory representationhad been easily accessible Our prototype of the pipelinewas hardcoded in C++ using plain OpenGL [47] with-out rapid prototyping functionality or automatic memorymanagement thus changing parameters or rewiring pipelineconnections required recompiling and rerunning the wholeprogram often causing problems due to technical oversightsand remaining bugs Our current framework attempts tosolve these problems It accelerates the once time-consumingpart of pipeline development so that the implementationcycles for new features are much shorter and also much moreefficient Regarding the implementation we intentionally rely

solely onOpenGL forGPUacceleration because it is themostcompatible complete and platform-independent graphicsand compute API currently available Other options eitherare vendor-locked (CUDA) have no graphics capabilities(OpenCL) or are exclusive to an operating system family(DirectX Metal) OpenGL does not come without problemshowever GPU drivers are mostly proprietary each imple-menting a different interpretation of the OpenGL specifi-cation sometimes exposing implementation differences andbugs [4 47] This may affect the core implementation there-fore care has been taken to adhere to the OpenGL 45 specifi-cation but some drivers may still cause problems Practicallythismay also cause user-writtenGLSL shaders towork fine ona specific setup but fail to compile or misbehave on anotherif the respective driverrsquos GLSL compilers exhibit differences

We also chose deliberately to not only enable but alsoenforce all node computations to be performed by the GPUWhile this rules out incorporating popular libraries suchas ITK and OpenCV we believe that this is the only wayto ensure a consistently fast pipeline without unnecessarilyslow legacy components The current focususe case of theapplication is novel development instead of reusing existingcomponents in a new package This is not expected to be aproblem for most users as this is what sets our frameworkapart from others

Given the ability to type in code at run-time andinspecting results immediately intuition-guided explorationof data is much easier and more interactive in comparisonto existing solutions as they offer mostly premade buildingblocksThe combination of rapid prototyping throughout ourentire implementation while at the same time also GPU-accelerating all calculations allows a user to process largerdatasets faster than with any of the other tools as these wereapparently not built with this kind of dynamism in mind

A selection of algorithms is included in the first publicrelease They might not be sufficient to cover all use casestherefore the use for medical research is not yet the primaryfocus of our package However even at this early stage ofdevelopment our software is useful for teaching and exten-sion for specific tasks by anyone

41 Future Work In its current state the software is a stand-alone application rather than a library Future functionalitymay include a program exporter to design entire processingpipelines inside the user interface and then export a singlescript that implements this dataflow graph This would beuseful for batch processing and inclusion in existing pipelinesand is expected to greatly simplify deployment for end usersWe also plan to include support for point clouds and regular3Dmeshes as they are strongly tied to graphics developmentand benefit greatly from GPU-accelerated processing Webelieve combining the ability to manipulate and visualizethese types of data in one package will be beneficial to otherrelated fieldsMoving away fromOpenGL is not a plan for theimmediate future but a long-term goal Switching to Vulkan[48] as a backend would not only enable multithreadedmulti-GPU computation but also enable support for mobiledevices (eg Android-based tablets) and hopefully minimizedriver-specific behavior when compared to OpenGL

International Journal of Biomedical Imaging 7

(a) (b)

Figure 6 Comparison of our graph editor (b) with the interface of DeVIDE (a) Our graph editor shows a lot more detail including a previewwhen hovering in-outputs if possible It also checks formisuse that is ensures connector type compatibility and prevents cyclesThe depictedDeVIDE graph did not work despite trying multiple variants There was no sign of error when building the graph

Figure 7 A dataflow graph to visualize a mouse CT volume Both ldquo3D Volumerdquo nodes render a volume texture into a 2D image givenindividual settings In this example the resulting 2D image is set to 4K resolution (3840 times 2160) An estimate of the time in milliseconds thata node spent computing on the GPU is displayed on each node respectively Any parameterimage changes propagate downstream so thatthe whole graph updates itself as necessary

Data Availability

Program and source code are accessible at httpsbitbucketorgmaxmalekxcv The mouse volume used in some of theexamples was converted from theDigimouse dataset [43]Thehuman head MRI data set was donated under the conditionof anonymity and cannot be published

Conflicts of Interest

There are no conflicts of interest financial or otherwiseassociated with this paper

Acknowledgments

Thisworkwas supported byTUGrazOpenAccess PublishingFund

Supplementary Materials

The supplement is a quick-start and developerrsquos guide for oursoftware xcv It describes how to extend the program relatedAPIs and good GLSL practices to maximize compatibilityacross graphics drivers The supplement also provides a

hands-on example for the implementation of a new nodetype (Supplementary Materials)

References

[1] D Fortmeier Direct volume rendering methods for needle inser-tion simulation [PhD Dissertation] University of LubeckGermany 2016 httpwwwzhbuni-luebeckdeepubsediss1748pdf

[2] J Zhou X Wang H Cui et al ldquoTopology-aware illuminationdesign for volume renderingrdquo BMC Bioinformatics vol 17 no1 2016

[3] R Srinivasan and S Fang ldquoIntegrating volume morphing andvisualizationrdquoComputational Geometry vol 15 no 1-3 pp 149ndash159 2000

[4] E Smistad M Bozorgi and F Lindseth ldquoFAST frameworkfor heterogeneousmedical image computing and visualizationrdquoInternational Journal for Computer Assisted Radiology andSurgery vol 10 no 11 pp 1811ndash1822 2015

[5] M A Akhloufi F Gariepy and G Champagne ldquoGPGPU real-time texture analysis frameworkrdquo in Proceedings of the Con-ference on Parallel Processing for Imaging Applications 2011 JD Owens I Lin Y Zhang and G B Beretta Eds vol 7872pp 7872-7872 San Francisco Airport CA USA 2011

8 International Journal of Biomedical Imaging

[6] R P Broussard and R W Ives ldquoUsing a commercial graphicalprocessing unit and theCUDAprogramming language to accel-erate scientific image processing applicationsrdquo in Proceedings ofthe Conference on Parallel Processing for Imaging Applications2011 J D Owens I Lin Y Zhang and G B Beretta Eds SPIEProceedings San Francisco Airport California USA

[7] A Eklund M Andersson and H Knutsson ldquofMRI analysis onthe GPUmdashPossibilities and challengesrdquo Computer Methods andPrograms in Biomedicine vol 105 no 2 pp 145ndash161 2012

[8] F Dachille K Kreeger B Chen I Bitter and A E KaufmanldquoHigh-quality volume rendering using texture mapping hard-warerdquo in Proceedings of the 1998 ACM SIGGRAPHEURO-GRAPHICS Workshop on Graphics Hardware A E KaufmanW Straszliger G Knittel H Pfister and S N Spencer Eds vol 31Lisbon Portugal 1998

[9] M D Hanwell K M Martin A Chaudhary and L S AvilaldquoThe Visualization Toolkit (VTK) Rewriting the renderingcode for modern graphics cardsrdquo SoftwareX vol 1-2 pp 9ndash122015

[10] J Sanders and E Kandrot CUDA by example an introductionto general purpose GPU programming Addison-Wesley 2011httpwwwworldcatorgoclc699702402

[11] R Yang andGWelch ldquoFast image segmentation and smoothingusing commodity graphics hardwarerdquo Journal of Graphics (GPUamp Game) Tools vol 7 no 4 pp 91ndash100 2002

[12] F D Igual R Mayo T D R Hartley U V Catalyurek A RuizandM Ujaldon ldquoExploring the GPU for enhancing parallelismon color and texture analysisrdquo in Proceedings of the in ParallelComputing FromMulticores and GPUs to Petascale Proceedingsof the conference ParCo B M Chapman F Desprez G RJoubert A Lichnewsky F J Peters and T Priol Eds vol 19pp 299ndash306 IOS Press 2009

[13] KOWGroupThe opencl specification version 22 2017 httpswwwkhronosorgopencl

[14] P Cozzi and C Riccio OpenGL Insights CRC Press 2012httpwwwopenglinsightscom

[15] M Bailey and S Cunningham ldquoComputer graphics shadersusing OpenGL 4Xrdquo in Proceedings of the ACM SIGGRAPHASIA 2010 Courses pp 1ndash173 Seoul Republic of Korea Decem-ber 2010

[16] M Bailey and S Cunningham Eds Graphics Shaders Theoryand Practice Taylor amp Francis 2nd edition 2011

[17] P Jeremias and I Quilez ldquoShadertoy Learn to create everythingin a fragment shaderrdquo in Proceedings of the SIGGRAPH Asia2014 Courses pp 1ndash15 New York NY USA December 2014

[18] M Reunanen Times of change in the demoscene A creative com-munity and its relationship with technology [PhD dissertation]University of Turku 2017

[19] D Hartmann Digital Art Natives Praktiken Artefakte undStrukturen der Computer-Demoszene Kulturverlag KadmosBerlin httpdigitalartnativesde

[20] H Marin-Vega G Alor-Hernandez R Zatarain-Cabada M LBarron-Estrada and J L Garcıa-Alcaraz ldquoA Brief Review ofGame Engines for Educational and Serious Games Develop-mentrdquo Journal of Information Technology Research vol 10 no4 pp 1ndash22 2017

[21] A L Turinsky E Fanea Q Trinh et al ldquoCAVEman Standard-ized anatomical context for biomedical data mappingrdquo Anato-mical Sciences Education vol 1 no 1 pp 10ndash18 2008

[22] R Ierusalimschy L H de Figueiredo and W C Filho ldquoLua-anextensible extension languagerdquo Software Practice and Experi-ence vol 26 no 6 pp 635ndash652 1996

[23] S Lantinga Simple DirectMedia Layer 2017 httplibsdlorg[24] O Cornut httpsgithubcomocornutimgui[25] K OW GroupThe opengl graphics system A specification (ver-

sion 45 (core profile) httpswwwkhronosorgregistryOpenGLspecsglglspec45corepdf

[26] C Everitt G Sellers J McDonald and T Foley httpswwwslidesharenetCassEverittapproaching-zero-driver-overhead

[27] Y Collet Zstandard 2015 httpzstdnet[28] H J Johnson M McCormick L Ibanez and T I S Con-

sortium The ITK Software Guide Kitware Inc 3rd editionhttpsitkorgItkSoftwareGuidepdf

[29] The HDF Group (1997-2017) Hierarchical data format version5 httpwwwhdfgrouporgHDF5

[30] httpsitkorgWikiITKFile Formats[31] S Barrett httpsgithubcomnothingsstb[32] F Ritter T Boskamp A Homeyer et al ldquoMedical image

analysisrdquo IEEE Pulse vol 2 no 6 pp 60ndash70 2011[33] C P Botha ldquoDeVIDE The delft visualisation and image proc-

essing development environmentrdquo Tech Rep Delft TechnicalUniversity 2004 httpsgraphicstudelftnlPublications-new2004BO04a

[34] R E Gabr G B Tefera W J Allen A S Pednekar and P ANarayana ldquoErratum to GRAPE a graphical pipeline environ-ment for image analysis in adaptive magnetic resonance imag-ingrdquo International Journal for Computer Assisted Radiology andSurgery vol 12 no 3 pp 459-459 2017

[35] A E Szalo A Zehner and C Palm ldquoGraphMICrdquo in Bildverar-beitung fur die Medizin 2015 Informatik aktuell pp 395ndash400Springer Berlin Heidelberg Berlin Heidelberg 2015

[36] W Schroeder K Martin and B Lorensen The VisualizationToolkit Kitware Inc 3rd edition 2004 httpwwwworldcatorgisbn1930934122

[37] I Wolf M Vetter I Wegner et al ldquoThe medical imaginginteraction toolkitrdquo Medical Image Analysis vol 9 no 6 pp594ndash604 2005

[38] M Nolden S Zelzer A Seitel et al ldquoThe medical imag-ing interaction toolkit Challenges and advances 10 years ofopen-source developmentrdquo International Journal for ComputerAssisted Radiology and Surgery vol 8 no 4 pp 607ndash620 2013

[39] W-K Jeong H Pfister and M Fatica ldquoMedical image process-ing using GPU-accelerated ITK image filtersrdquo GPU ComputingGems Emerald Edition pp 737ndash749 2011

[40] J Blanchette andM Summerfield C++ GUI Programming withQt 4 Prentice Hall 2006

[41] G van Rossum ldquoPython programming languagerdquo in Proceed-ings of the 2007 USENIX Annual Technical Conference J Chaseand S Seshan Eds Santa Clara CA USA 2007 httpswwwusenixorgpublicationsproceedingsf[0]=im group audience3A114

[42] A Polukhin Boost C++ Application Development CookbookPackt Publishing 2nd edition 2017

[43] B Dogdas D Stout A F Chatziioannou and R M LeahyldquoDigimouse a 3D whole body mouse atlas from CT and cryo-section datardquo Physics in Medicine and Biology vol 52 no 3 pp577ndash587 2007

[44] S Roettger httplgdvcsfaudeExternalvollib[45] P F Felzenszwalb and D P Huttenlocher ldquoDistance transforms

of sampled functionsrdquo Theory of Computing An Open AccessJournal vol 8 pp 415ndash428 2012

International Journal of Biomedical Imaging 9

[46] K E Batcher ldquoSorting networks and their applicationsrdquo inProceedings of the the April 30ndashMay 2 1968 spring joint com-puter conference p 307 Atlantic City New Jersey April 1968

[47] J Barczak OpenGL Is Broken The Burning Basis Vector [blog]2014 httpwwwjoshbarczakcomblogp=154

[48] T K V W Group Vulkan 1066 - a specification 2017 httpswwwkhronosorgvulkan

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 5: Instant Feedback Rapid Prototyping for GPU-Accelerated ...downloads.hindawi.com/journals/ijbi/2018/2046269.pdf · ResearchArticle Instant Feedback Rapid Prototyping for GPU-Accelerated

International Journal of Biomedical Imaging 5

Figure 3The distance transform from Table 1 combined with a custom compute node running the GLSL snippet on the right is a simple wayto remove most of the skull in a head MRI scan The result is redcyan stereo-rendered

(a) (b)

Figure 4 Slice of a brain MRI scan (a) is normalized but otherwise unprocessed (b) is segmented into skullskin (teal) cerebrospinal fluid(red) and brain (grey) tissue The brain tissue is contrast-enhanced

Table 1 Timings for selected operations on 3 different volumes The speedup factor between CPU and GPU is calculated as the lowest CPUtime (8 threads) divided by the GPU time The 3D Euclidian distance transform operates on normalized (pixel values in [0 sdot sdot sdot 1]) volumeswith a solidity threshold of 05 The CPU implementation is our own including the multithreading Intensity rescaling and median filteringuse ITKrsquos CPU-based implementation The median filter takes the direct neighborhood of each voxel into account that is a box of 27 voxelsin total

Volume CPU (1 thread) CPU (8 threads) GPU Speedup 1 versus 8 threads Speedup CPU versus GPUDistance transform

256 times 256 times 176 1345 s 459 s 630ms 29x 73x512 times 512 times 19 632 s 203 s 217ms 31x 93x380 times 992 times 208 1631 s 627 s 4900ms 26x 128x

Rescale intensity to [0 sdot sdot sdot 1]256 times 256 times 176 88ms 68ms 101ms 129x 67x512 times 512 times 19 38ms 30ms 72ms 127x 417x380 times 992 times 208 603ms 457ms 975ms 131x 47x

Median filter256 times 256 times 176 4155ms 1544ms 97ms 27x 159x512 times 512 times 19 2120ms 671ms 48ms 315x 14x380 times 992 times 208 14 s 4800ms 715ms 29x 67x

6 International Journal of Biomedical Imaging

Figure 5 A minimal working example This node calculates (V +add)exp for every pixel and color channel in a 2D image resultingin contrast enhancement The only Lua code is the definitiontable in the last line the rest is GLSL embedded in a Lua stringMissing interface functions are automatically inducedThe resultinggraphical representation is displayed in the small boxnode (darkbackground green title bar) More information can be found in thesupplement (available here)

adjustment in real-time (Figures 4 and 5) Median filteringon the GPU is costly due to the sorting involved sorting isimplemented using a parallel sorting network [46]

4 Discussion

We expect that our framework will make GPU-accelerateddata processing more accessible While the examples givenso far mainly target use cases in the medical domain ourmethod is not limited to these and can be used formany kindsof 3D data processing that involve images or parallelizableoperations on a block of data The motivation to create aself-contained framework for rapid prototyping and built-in visualization came from difficulties when developingcertain methods to operate on CTMRI data The intendedalgorithms were too costly to execute on CPUs in anyreasonable time so GPU acceleration was a necessity Manytime-consuming problems during development could havebeen avoided if not only the inspection of data had beensupported visually but also the in-memory representationhad been easily accessible Our prototype of the pipelinewas hardcoded in C++ using plain OpenGL [47] with-out rapid prototyping functionality or automatic memorymanagement thus changing parameters or rewiring pipelineconnections required recompiling and rerunning the wholeprogram often causing problems due to technical oversightsand remaining bugs Our current framework attempts tosolve these problems It accelerates the once time-consumingpart of pipeline development so that the implementationcycles for new features are much shorter and also much moreefficient Regarding the implementation we intentionally rely

solely onOpenGL forGPUacceleration because it is themostcompatible complete and platform-independent graphicsand compute API currently available Other options eitherare vendor-locked (CUDA) have no graphics capabilities(OpenCL) or are exclusive to an operating system family(DirectX Metal) OpenGL does not come without problemshowever GPU drivers are mostly proprietary each imple-menting a different interpretation of the OpenGL specifi-cation sometimes exposing implementation differences andbugs [4 47] This may affect the core implementation there-fore care has been taken to adhere to the OpenGL 45 specifi-cation but some drivers may still cause problems Practicallythismay also cause user-writtenGLSL shaders towork fine ona specific setup but fail to compile or misbehave on anotherif the respective driverrsquos GLSL compilers exhibit differences

We also chose deliberately to not only enable but alsoenforce all node computations to be performed by the GPUWhile this rules out incorporating popular libraries suchas ITK and OpenCV we believe that this is the only wayto ensure a consistently fast pipeline without unnecessarilyslow legacy components The current focususe case of theapplication is novel development instead of reusing existingcomponents in a new package This is not expected to be aproblem for most users as this is what sets our frameworkapart from others

Given the ability to type in code at run-time andinspecting results immediately intuition-guided explorationof data is much easier and more interactive in comparisonto existing solutions as they offer mostly premade buildingblocksThe combination of rapid prototyping throughout ourentire implementation while at the same time also GPU-accelerating all calculations allows a user to process largerdatasets faster than with any of the other tools as these wereapparently not built with this kind of dynamism in mind

A selection of algorithms is included in the first publicrelease They might not be sufficient to cover all use casestherefore the use for medical research is not yet the primaryfocus of our package However even at this early stage ofdevelopment our software is useful for teaching and exten-sion for specific tasks by anyone

41 Future Work In its current state the software is a stand-alone application rather than a library Future functionalitymay include a program exporter to design entire processingpipelines inside the user interface and then export a singlescript that implements this dataflow graph This would beuseful for batch processing and inclusion in existing pipelinesand is expected to greatly simplify deployment for end usersWe also plan to include support for point clouds and regular3Dmeshes as they are strongly tied to graphics developmentand benefit greatly from GPU-accelerated processing Webelieve combining the ability to manipulate and visualizethese types of data in one package will be beneficial to otherrelated fieldsMoving away fromOpenGL is not a plan for theimmediate future but a long-term goal Switching to Vulkan[48] as a backend would not only enable multithreadedmulti-GPU computation but also enable support for mobiledevices (eg Android-based tablets) and hopefully minimizedriver-specific behavior when compared to OpenGL

International Journal of Biomedical Imaging 7

(a) (b)

Figure 6 Comparison of our graph editor (b) with the interface of DeVIDE (a) Our graph editor shows a lot more detail including a previewwhen hovering in-outputs if possible It also checks formisuse that is ensures connector type compatibility and prevents cyclesThe depictedDeVIDE graph did not work despite trying multiple variants There was no sign of error when building the graph

Figure 7 A dataflow graph to visualize a mouse CT volume Both ldquo3D Volumerdquo nodes render a volume texture into a 2D image givenindividual settings In this example the resulting 2D image is set to 4K resolution (3840 times 2160) An estimate of the time in milliseconds thata node spent computing on the GPU is displayed on each node respectively Any parameterimage changes propagate downstream so thatthe whole graph updates itself as necessary

Data Availability

Program and source code are accessible at httpsbitbucketorgmaxmalekxcv The mouse volume used in some of theexamples was converted from theDigimouse dataset [43]Thehuman head MRI data set was donated under the conditionof anonymity and cannot be published

Conflicts of Interest

There are no conflicts of interest financial or otherwiseassociated with this paper

Acknowledgments

Thisworkwas supported byTUGrazOpenAccess PublishingFund

Supplementary Materials

The supplement is a quick-start and developerrsquos guide for oursoftware xcv It describes how to extend the program relatedAPIs and good GLSL practices to maximize compatibilityacross graphics drivers The supplement also provides a

hands-on example for the implementation of a new nodetype (Supplementary Materials)

References

[1] D Fortmeier Direct volume rendering methods for needle inser-tion simulation [PhD Dissertation] University of LubeckGermany 2016 httpwwwzhbuni-luebeckdeepubsediss1748pdf

[2] J Zhou X Wang H Cui et al ldquoTopology-aware illuminationdesign for volume renderingrdquo BMC Bioinformatics vol 17 no1 2016

[3] R Srinivasan and S Fang ldquoIntegrating volume morphing andvisualizationrdquoComputational Geometry vol 15 no 1-3 pp 149ndash159 2000

[4] E Smistad M Bozorgi and F Lindseth ldquoFAST frameworkfor heterogeneousmedical image computing and visualizationrdquoInternational Journal for Computer Assisted Radiology andSurgery vol 10 no 11 pp 1811ndash1822 2015

[5] M A Akhloufi F Gariepy and G Champagne ldquoGPGPU real-time texture analysis frameworkrdquo in Proceedings of the Con-ference on Parallel Processing for Imaging Applications 2011 JD Owens I Lin Y Zhang and G B Beretta Eds vol 7872pp 7872-7872 San Francisco Airport CA USA 2011

8 International Journal of Biomedical Imaging

[6] R P Broussard and R W Ives ldquoUsing a commercial graphicalprocessing unit and theCUDAprogramming language to accel-erate scientific image processing applicationsrdquo in Proceedings ofthe Conference on Parallel Processing for Imaging Applications2011 J D Owens I Lin Y Zhang and G B Beretta Eds SPIEProceedings San Francisco Airport California USA

[7] A Eklund M Andersson and H Knutsson ldquofMRI analysis onthe GPUmdashPossibilities and challengesrdquo Computer Methods andPrograms in Biomedicine vol 105 no 2 pp 145ndash161 2012

[8] F Dachille K Kreeger B Chen I Bitter and A E KaufmanldquoHigh-quality volume rendering using texture mapping hard-warerdquo in Proceedings of the 1998 ACM SIGGRAPHEURO-GRAPHICS Workshop on Graphics Hardware A E KaufmanW Straszliger G Knittel H Pfister and S N Spencer Eds vol 31Lisbon Portugal 1998

[9] M D Hanwell K M Martin A Chaudhary and L S AvilaldquoThe Visualization Toolkit (VTK) Rewriting the renderingcode for modern graphics cardsrdquo SoftwareX vol 1-2 pp 9ndash122015

[10] J Sanders and E Kandrot CUDA by example an introductionto general purpose GPU programming Addison-Wesley 2011httpwwwworldcatorgoclc699702402

[11] R Yang andGWelch ldquoFast image segmentation and smoothingusing commodity graphics hardwarerdquo Journal of Graphics (GPUamp Game) Tools vol 7 no 4 pp 91ndash100 2002

[12] F D Igual R Mayo T D R Hartley U V Catalyurek A RuizandM Ujaldon ldquoExploring the GPU for enhancing parallelismon color and texture analysisrdquo in Proceedings of the in ParallelComputing FromMulticores and GPUs to Petascale Proceedingsof the conference ParCo B M Chapman F Desprez G RJoubert A Lichnewsky F J Peters and T Priol Eds vol 19pp 299ndash306 IOS Press 2009

[13] KOWGroupThe opencl specification version 22 2017 httpswwwkhronosorgopencl

[14] P Cozzi and C Riccio OpenGL Insights CRC Press 2012httpwwwopenglinsightscom

[15] M Bailey and S Cunningham ldquoComputer graphics shadersusing OpenGL 4Xrdquo in Proceedings of the ACM SIGGRAPHASIA 2010 Courses pp 1ndash173 Seoul Republic of Korea Decem-ber 2010

[16] M Bailey and S Cunningham Eds Graphics Shaders Theoryand Practice Taylor amp Francis 2nd edition 2011

[17] P Jeremias and I Quilez ldquoShadertoy Learn to create everythingin a fragment shaderrdquo in Proceedings of the SIGGRAPH Asia2014 Courses pp 1ndash15 New York NY USA December 2014

[18] M Reunanen Times of change in the demoscene A creative com-munity and its relationship with technology [PhD dissertation]University of Turku 2017

[19] D Hartmann Digital Art Natives Praktiken Artefakte undStrukturen der Computer-Demoszene Kulturverlag KadmosBerlin httpdigitalartnativesde

[20] H Marin-Vega G Alor-Hernandez R Zatarain-Cabada M LBarron-Estrada and J L Garcıa-Alcaraz ldquoA Brief Review ofGame Engines for Educational and Serious Games Develop-mentrdquo Journal of Information Technology Research vol 10 no4 pp 1ndash22 2017

[21] A L Turinsky E Fanea Q Trinh et al ldquoCAVEman Standard-ized anatomical context for biomedical data mappingrdquo Anato-mical Sciences Education vol 1 no 1 pp 10ndash18 2008

[22] R Ierusalimschy L H de Figueiredo and W C Filho ldquoLua-anextensible extension languagerdquo Software Practice and Experi-ence vol 26 no 6 pp 635ndash652 1996

[23] S Lantinga Simple DirectMedia Layer 2017 httplibsdlorg[24] O Cornut httpsgithubcomocornutimgui[25] K OW GroupThe opengl graphics system A specification (ver-

sion 45 (core profile) httpswwwkhronosorgregistryOpenGLspecsglglspec45corepdf

[26] C Everitt G Sellers J McDonald and T Foley httpswwwslidesharenetCassEverittapproaching-zero-driver-overhead

[27] Y Collet Zstandard 2015 httpzstdnet[28] H J Johnson M McCormick L Ibanez and T I S Con-

sortium The ITK Software Guide Kitware Inc 3rd editionhttpsitkorgItkSoftwareGuidepdf

[29] The HDF Group (1997-2017) Hierarchical data format version5 httpwwwhdfgrouporgHDF5

[30] httpsitkorgWikiITKFile Formats[31] S Barrett httpsgithubcomnothingsstb[32] F Ritter T Boskamp A Homeyer et al ldquoMedical image

analysisrdquo IEEE Pulse vol 2 no 6 pp 60ndash70 2011[33] C P Botha ldquoDeVIDE The delft visualisation and image proc-

essing development environmentrdquo Tech Rep Delft TechnicalUniversity 2004 httpsgraphicstudelftnlPublications-new2004BO04a

[34] R E Gabr G B Tefera W J Allen A S Pednekar and P ANarayana ldquoErratum to GRAPE a graphical pipeline environ-ment for image analysis in adaptive magnetic resonance imag-ingrdquo International Journal for Computer Assisted Radiology andSurgery vol 12 no 3 pp 459-459 2017

[35] A E Szalo A Zehner and C Palm ldquoGraphMICrdquo in Bildverar-beitung fur die Medizin 2015 Informatik aktuell pp 395ndash400Springer Berlin Heidelberg Berlin Heidelberg 2015

[36] W Schroeder K Martin and B Lorensen The VisualizationToolkit Kitware Inc 3rd edition 2004 httpwwwworldcatorgisbn1930934122

[37] I Wolf M Vetter I Wegner et al ldquoThe medical imaginginteraction toolkitrdquo Medical Image Analysis vol 9 no 6 pp594ndash604 2005

[38] M Nolden S Zelzer A Seitel et al ldquoThe medical imag-ing interaction toolkit Challenges and advances 10 years ofopen-source developmentrdquo International Journal for ComputerAssisted Radiology and Surgery vol 8 no 4 pp 607ndash620 2013

[39] W-K Jeong H Pfister and M Fatica ldquoMedical image process-ing using GPU-accelerated ITK image filtersrdquo GPU ComputingGems Emerald Edition pp 737ndash749 2011

[40] J Blanchette andM Summerfield C++ GUI Programming withQt 4 Prentice Hall 2006

[41] G van Rossum ldquoPython programming languagerdquo in Proceed-ings of the 2007 USENIX Annual Technical Conference J Chaseand S Seshan Eds Santa Clara CA USA 2007 httpswwwusenixorgpublicationsproceedingsf[0]=im group audience3A114

[42] A Polukhin Boost C++ Application Development CookbookPackt Publishing 2nd edition 2017

[43] B Dogdas D Stout A F Chatziioannou and R M LeahyldquoDigimouse a 3D whole body mouse atlas from CT and cryo-section datardquo Physics in Medicine and Biology vol 52 no 3 pp577ndash587 2007

[44] S Roettger httplgdvcsfaudeExternalvollib[45] P F Felzenszwalb and D P Huttenlocher ldquoDistance transforms

of sampled functionsrdquo Theory of Computing An Open AccessJournal vol 8 pp 415ndash428 2012

International Journal of Biomedical Imaging 9

[46] K E Batcher ldquoSorting networks and their applicationsrdquo inProceedings of the the April 30ndashMay 2 1968 spring joint com-puter conference p 307 Atlantic City New Jersey April 1968

[47] J Barczak OpenGL Is Broken The Burning Basis Vector [blog]2014 httpwwwjoshbarczakcomblogp=154

[48] T K V W Group Vulkan 1066 - a specification 2017 httpswwwkhronosorgvulkan

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 6: Instant Feedback Rapid Prototyping for GPU-Accelerated ...downloads.hindawi.com/journals/ijbi/2018/2046269.pdf · ResearchArticle Instant Feedback Rapid Prototyping for GPU-Accelerated

6 International Journal of Biomedical Imaging

Figure 5 A minimal working example This node calculates (V +add)exp for every pixel and color channel in a 2D image resultingin contrast enhancement The only Lua code is the definitiontable in the last line the rest is GLSL embedded in a Lua stringMissing interface functions are automatically inducedThe resultinggraphical representation is displayed in the small boxnode (darkbackground green title bar) More information can be found in thesupplement (available here)

adjustment in real-time (Figures 4 and 5) Median filteringon the GPU is costly due to the sorting involved sorting isimplemented using a parallel sorting network [46]

4 Discussion

We expect that our framework will make GPU-accelerateddata processing more accessible While the examples givenso far mainly target use cases in the medical domain ourmethod is not limited to these and can be used formany kindsof 3D data processing that involve images or parallelizableoperations on a block of data The motivation to create aself-contained framework for rapid prototyping and built-in visualization came from difficulties when developingcertain methods to operate on CTMRI data The intendedalgorithms were too costly to execute on CPUs in anyreasonable time so GPU acceleration was a necessity Manytime-consuming problems during development could havebeen avoided if not only the inspection of data had beensupported visually but also the in-memory representationhad been easily accessible Our prototype of the pipelinewas hardcoded in C++ using plain OpenGL [47] with-out rapid prototyping functionality or automatic memorymanagement thus changing parameters or rewiring pipelineconnections required recompiling and rerunning the wholeprogram often causing problems due to technical oversightsand remaining bugs Our current framework attempts tosolve these problems It accelerates the once time-consumingpart of pipeline development so that the implementationcycles for new features are much shorter and also much moreefficient Regarding the implementation we intentionally rely

solely onOpenGL forGPUacceleration because it is themostcompatible complete and platform-independent graphicsand compute API currently available Other options eitherare vendor-locked (CUDA) have no graphics capabilities(OpenCL) or are exclusive to an operating system family(DirectX Metal) OpenGL does not come without problemshowever GPU drivers are mostly proprietary each imple-menting a different interpretation of the OpenGL specifi-cation sometimes exposing implementation differences andbugs [4 47] This may affect the core implementation there-fore care has been taken to adhere to the OpenGL 45 specifi-cation but some drivers may still cause problems Practicallythismay also cause user-writtenGLSL shaders towork fine ona specific setup but fail to compile or misbehave on anotherif the respective driverrsquos GLSL compilers exhibit differences

We also chose deliberately to not only enable but alsoenforce all node computations to be performed by the GPUWhile this rules out incorporating popular libraries suchas ITK and OpenCV we believe that this is the only wayto ensure a consistently fast pipeline without unnecessarilyslow legacy components The current focususe case of theapplication is novel development instead of reusing existingcomponents in a new package This is not expected to be aproblem for most users as this is what sets our frameworkapart from others

Given the ability to type in code at run-time andinspecting results immediately intuition-guided explorationof data is much easier and more interactive in comparisonto existing solutions as they offer mostly premade buildingblocksThe combination of rapid prototyping throughout ourentire implementation while at the same time also GPU-accelerating all calculations allows a user to process largerdatasets faster than with any of the other tools as these wereapparently not built with this kind of dynamism in mind

A selection of algorithms is included in the first publicrelease They might not be sufficient to cover all use casestherefore the use for medical research is not yet the primaryfocus of our package However even at this early stage ofdevelopment our software is useful for teaching and exten-sion for specific tasks by anyone

41 Future Work In its current state the software is a stand-alone application rather than a library Future functionalitymay include a program exporter to design entire processingpipelines inside the user interface and then export a singlescript that implements this dataflow graph This would beuseful for batch processing and inclusion in existing pipelinesand is expected to greatly simplify deployment for end usersWe also plan to include support for point clouds and regular3Dmeshes as they are strongly tied to graphics developmentand benefit greatly from GPU-accelerated processing Webelieve combining the ability to manipulate and visualizethese types of data in one package will be beneficial to otherrelated fieldsMoving away fromOpenGL is not a plan for theimmediate future but a long-term goal Switching to Vulkan[48] as a backend would not only enable multithreadedmulti-GPU computation but also enable support for mobiledevices (eg Android-based tablets) and hopefully minimizedriver-specific behavior when compared to OpenGL

International Journal of Biomedical Imaging 7

(a) (b)

Figure 6 Comparison of our graph editor (b) with the interface of DeVIDE (a) Our graph editor shows a lot more detail including a previewwhen hovering in-outputs if possible It also checks formisuse that is ensures connector type compatibility and prevents cyclesThe depictedDeVIDE graph did not work despite trying multiple variants There was no sign of error when building the graph

Figure 7 A dataflow graph to visualize a mouse CT volume Both ldquo3D Volumerdquo nodes render a volume texture into a 2D image givenindividual settings In this example the resulting 2D image is set to 4K resolution (3840 times 2160) An estimate of the time in milliseconds thata node spent computing on the GPU is displayed on each node respectively Any parameterimage changes propagate downstream so thatthe whole graph updates itself as necessary

Data Availability

Program and source code are accessible at httpsbitbucketorgmaxmalekxcv The mouse volume used in some of theexamples was converted from theDigimouse dataset [43]Thehuman head MRI data set was donated under the conditionof anonymity and cannot be published

Conflicts of Interest

There are no conflicts of interest financial or otherwiseassociated with this paper

Acknowledgments

Thisworkwas supported byTUGrazOpenAccess PublishingFund

Supplementary Materials

The supplement is a quick-start and developerrsquos guide for oursoftware xcv It describes how to extend the program relatedAPIs and good GLSL practices to maximize compatibilityacross graphics drivers The supplement also provides a

hands-on example for the implementation of a new nodetype (Supplementary Materials)

References

[1] D Fortmeier Direct volume rendering methods for needle inser-tion simulation [PhD Dissertation] University of LubeckGermany 2016 httpwwwzhbuni-luebeckdeepubsediss1748pdf

[2] J Zhou X Wang H Cui et al ldquoTopology-aware illuminationdesign for volume renderingrdquo BMC Bioinformatics vol 17 no1 2016

[3] R Srinivasan and S Fang ldquoIntegrating volume morphing andvisualizationrdquoComputational Geometry vol 15 no 1-3 pp 149ndash159 2000

[4] E Smistad M Bozorgi and F Lindseth ldquoFAST frameworkfor heterogeneousmedical image computing and visualizationrdquoInternational Journal for Computer Assisted Radiology andSurgery vol 10 no 11 pp 1811ndash1822 2015

[5] M A Akhloufi F Gariepy and G Champagne ldquoGPGPU real-time texture analysis frameworkrdquo in Proceedings of the Con-ference on Parallel Processing for Imaging Applications 2011 JD Owens I Lin Y Zhang and G B Beretta Eds vol 7872pp 7872-7872 San Francisco Airport CA USA 2011

8 International Journal of Biomedical Imaging

[6] R P Broussard and R W Ives ldquoUsing a commercial graphicalprocessing unit and theCUDAprogramming language to accel-erate scientific image processing applicationsrdquo in Proceedings ofthe Conference on Parallel Processing for Imaging Applications2011 J D Owens I Lin Y Zhang and G B Beretta Eds SPIEProceedings San Francisco Airport California USA

[7] A Eklund M Andersson and H Knutsson ldquofMRI analysis onthe GPUmdashPossibilities and challengesrdquo Computer Methods andPrograms in Biomedicine vol 105 no 2 pp 145ndash161 2012

[8] F Dachille K Kreeger B Chen I Bitter and A E KaufmanldquoHigh-quality volume rendering using texture mapping hard-warerdquo in Proceedings of the 1998 ACM SIGGRAPHEURO-GRAPHICS Workshop on Graphics Hardware A E KaufmanW Straszliger G Knittel H Pfister and S N Spencer Eds vol 31Lisbon Portugal 1998

[9] M D Hanwell K M Martin A Chaudhary and L S AvilaldquoThe Visualization Toolkit (VTK) Rewriting the renderingcode for modern graphics cardsrdquo SoftwareX vol 1-2 pp 9ndash122015

[10] J Sanders and E Kandrot CUDA by example an introductionto general purpose GPU programming Addison-Wesley 2011httpwwwworldcatorgoclc699702402

[11] R Yang andGWelch ldquoFast image segmentation and smoothingusing commodity graphics hardwarerdquo Journal of Graphics (GPUamp Game) Tools vol 7 no 4 pp 91ndash100 2002

[12] F D Igual R Mayo T D R Hartley U V Catalyurek A RuizandM Ujaldon ldquoExploring the GPU for enhancing parallelismon color and texture analysisrdquo in Proceedings of the in ParallelComputing FromMulticores and GPUs to Petascale Proceedingsof the conference ParCo B M Chapman F Desprez G RJoubert A Lichnewsky F J Peters and T Priol Eds vol 19pp 299ndash306 IOS Press 2009

[13] KOWGroupThe opencl specification version 22 2017 httpswwwkhronosorgopencl

[14] P Cozzi and C Riccio OpenGL Insights CRC Press 2012httpwwwopenglinsightscom

[15] M Bailey and S Cunningham ldquoComputer graphics shadersusing OpenGL 4Xrdquo in Proceedings of the ACM SIGGRAPHASIA 2010 Courses pp 1ndash173 Seoul Republic of Korea Decem-ber 2010

[16] M Bailey and S Cunningham Eds Graphics Shaders Theoryand Practice Taylor amp Francis 2nd edition 2011

[17] P Jeremias and I Quilez ldquoShadertoy Learn to create everythingin a fragment shaderrdquo in Proceedings of the SIGGRAPH Asia2014 Courses pp 1ndash15 New York NY USA December 2014

[18] M Reunanen Times of change in the demoscene A creative com-munity and its relationship with technology [PhD dissertation]University of Turku 2017

[19] D Hartmann Digital Art Natives Praktiken Artefakte undStrukturen der Computer-Demoszene Kulturverlag KadmosBerlin httpdigitalartnativesde

[20] H Marin-Vega G Alor-Hernandez R Zatarain-Cabada M LBarron-Estrada and J L Garcıa-Alcaraz ldquoA Brief Review ofGame Engines for Educational and Serious Games Develop-mentrdquo Journal of Information Technology Research vol 10 no4 pp 1ndash22 2017

[21] A L Turinsky E Fanea Q Trinh et al ldquoCAVEman Standard-ized anatomical context for biomedical data mappingrdquo Anato-mical Sciences Education vol 1 no 1 pp 10ndash18 2008

[22] R Ierusalimschy L H de Figueiredo and W C Filho ldquoLua-anextensible extension languagerdquo Software Practice and Experi-ence vol 26 no 6 pp 635ndash652 1996

[23] S Lantinga Simple DirectMedia Layer 2017 httplibsdlorg[24] O Cornut httpsgithubcomocornutimgui[25] K OW GroupThe opengl graphics system A specification (ver-

sion 45 (core profile) httpswwwkhronosorgregistryOpenGLspecsglglspec45corepdf

[26] C Everitt G Sellers J McDonald and T Foley httpswwwslidesharenetCassEverittapproaching-zero-driver-overhead

[27] Y Collet Zstandard 2015 httpzstdnet[28] H J Johnson M McCormick L Ibanez and T I S Con-

sortium The ITK Software Guide Kitware Inc 3rd editionhttpsitkorgItkSoftwareGuidepdf

[29] The HDF Group (1997-2017) Hierarchical data format version5 httpwwwhdfgrouporgHDF5

[30] httpsitkorgWikiITKFile Formats[31] S Barrett httpsgithubcomnothingsstb[32] F Ritter T Boskamp A Homeyer et al ldquoMedical image

analysisrdquo IEEE Pulse vol 2 no 6 pp 60ndash70 2011[33] C P Botha ldquoDeVIDE The delft visualisation and image proc-

essing development environmentrdquo Tech Rep Delft TechnicalUniversity 2004 httpsgraphicstudelftnlPublications-new2004BO04a

[34] R E Gabr G B Tefera W J Allen A S Pednekar and P ANarayana ldquoErratum to GRAPE a graphical pipeline environ-ment for image analysis in adaptive magnetic resonance imag-ingrdquo International Journal for Computer Assisted Radiology andSurgery vol 12 no 3 pp 459-459 2017

[35] A E Szalo A Zehner and C Palm ldquoGraphMICrdquo in Bildverar-beitung fur die Medizin 2015 Informatik aktuell pp 395ndash400Springer Berlin Heidelberg Berlin Heidelberg 2015

[36] W Schroeder K Martin and B Lorensen The VisualizationToolkit Kitware Inc 3rd edition 2004 httpwwwworldcatorgisbn1930934122

[37] I Wolf M Vetter I Wegner et al ldquoThe medical imaginginteraction toolkitrdquo Medical Image Analysis vol 9 no 6 pp594ndash604 2005

[38] M Nolden S Zelzer A Seitel et al ldquoThe medical imag-ing interaction toolkit Challenges and advances 10 years ofopen-source developmentrdquo International Journal for ComputerAssisted Radiology and Surgery vol 8 no 4 pp 607ndash620 2013

[39] W-K Jeong H Pfister and M Fatica ldquoMedical image process-ing using GPU-accelerated ITK image filtersrdquo GPU ComputingGems Emerald Edition pp 737ndash749 2011

[40] J Blanchette andM Summerfield C++ GUI Programming withQt 4 Prentice Hall 2006

[41] G van Rossum ldquoPython programming languagerdquo in Proceed-ings of the 2007 USENIX Annual Technical Conference J Chaseand S Seshan Eds Santa Clara CA USA 2007 httpswwwusenixorgpublicationsproceedingsf[0]=im group audience3A114

[42] A Polukhin Boost C++ Application Development CookbookPackt Publishing 2nd edition 2017

[43] B Dogdas D Stout A F Chatziioannou and R M LeahyldquoDigimouse a 3D whole body mouse atlas from CT and cryo-section datardquo Physics in Medicine and Biology vol 52 no 3 pp577ndash587 2007

[44] S Roettger httplgdvcsfaudeExternalvollib[45] P F Felzenszwalb and D P Huttenlocher ldquoDistance transforms

of sampled functionsrdquo Theory of Computing An Open AccessJournal vol 8 pp 415ndash428 2012

International Journal of Biomedical Imaging 9

[46] K E Batcher ldquoSorting networks and their applicationsrdquo inProceedings of the the April 30ndashMay 2 1968 spring joint com-puter conference p 307 Atlantic City New Jersey April 1968

[47] J Barczak OpenGL Is Broken The Burning Basis Vector [blog]2014 httpwwwjoshbarczakcomblogp=154

[48] T K V W Group Vulkan 1066 - a specification 2017 httpswwwkhronosorgvulkan

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 7: Instant Feedback Rapid Prototyping for GPU-Accelerated ...downloads.hindawi.com/journals/ijbi/2018/2046269.pdf · ResearchArticle Instant Feedback Rapid Prototyping for GPU-Accelerated

International Journal of Biomedical Imaging 7

(a) (b)

Figure 6 Comparison of our graph editor (b) with the interface of DeVIDE (a) Our graph editor shows a lot more detail including a previewwhen hovering in-outputs if possible It also checks formisuse that is ensures connector type compatibility and prevents cyclesThe depictedDeVIDE graph did not work despite trying multiple variants There was no sign of error when building the graph

Figure 7 A dataflow graph to visualize a mouse CT volume Both ldquo3D Volumerdquo nodes render a volume texture into a 2D image givenindividual settings In this example the resulting 2D image is set to 4K resolution (3840 times 2160) An estimate of the time in milliseconds thata node spent computing on the GPU is displayed on each node respectively Any parameterimage changes propagate downstream so thatthe whole graph updates itself as necessary

Data Availability

Program and source code are accessible at httpsbitbucketorgmaxmalekxcv The mouse volume used in some of theexamples was converted from theDigimouse dataset [43]Thehuman head MRI data set was donated under the conditionof anonymity and cannot be published

Conflicts of Interest

There are no conflicts of interest financial or otherwiseassociated with this paper

Acknowledgments

Thisworkwas supported byTUGrazOpenAccess PublishingFund

Supplementary Materials

The supplement is a quick-start and developerrsquos guide for oursoftware xcv It describes how to extend the program relatedAPIs and good GLSL practices to maximize compatibilityacross graphics drivers The supplement also provides a

hands-on example for the implementation of a new nodetype (Supplementary Materials)

References

[1] D Fortmeier Direct volume rendering methods for needle inser-tion simulation [PhD Dissertation] University of LubeckGermany 2016 httpwwwzhbuni-luebeckdeepubsediss1748pdf

[2] J Zhou X Wang H Cui et al ldquoTopology-aware illuminationdesign for volume renderingrdquo BMC Bioinformatics vol 17 no1 2016

[3] R Srinivasan and S Fang ldquoIntegrating volume morphing andvisualizationrdquoComputational Geometry vol 15 no 1-3 pp 149ndash159 2000

[4] E Smistad M Bozorgi and F Lindseth ldquoFAST frameworkfor heterogeneousmedical image computing and visualizationrdquoInternational Journal for Computer Assisted Radiology andSurgery vol 10 no 11 pp 1811ndash1822 2015

[5] M A Akhloufi F Gariepy and G Champagne ldquoGPGPU real-time texture analysis frameworkrdquo in Proceedings of the Con-ference on Parallel Processing for Imaging Applications 2011 JD Owens I Lin Y Zhang and G B Beretta Eds vol 7872pp 7872-7872 San Francisco Airport CA USA 2011

8 International Journal of Biomedical Imaging

[6] R P Broussard and R W Ives ldquoUsing a commercial graphicalprocessing unit and theCUDAprogramming language to accel-erate scientific image processing applicationsrdquo in Proceedings ofthe Conference on Parallel Processing for Imaging Applications2011 J D Owens I Lin Y Zhang and G B Beretta Eds SPIEProceedings San Francisco Airport California USA

[7] A Eklund M Andersson and H Knutsson ldquofMRI analysis onthe GPUmdashPossibilities and challengesrdquo Computer Methods andPrograms in Biomedicine vol 105 no 2 pp 145ndash161 2012

[8] F Dachille K Kreeger B Chen I Bitter and A E KaufmanldquoHigh-quality volume rendering using texture mapping hard-warerdquo in Proceedings of the 1998 ACM SIGGRAPHEURO-GRAPHICS Workshop on Graphics Hardware A E KaufmanW Straszliger G Knittel H Pfister and S N Spencer Eds vol 31Lisbon Portugal 1998

[9] M D Hanwell K M Martin A Chaudhary and L S AvilaldquoThe Visualization Toolkit (VTK) Rewriting the renderingcode for modern graphics cardsrdquo SoftwareX vol 1-2 pp 9ndash122015

[10] J Sanders and E Kandrot CUDA by example an introductionto general purpose GPU programming Addison-Wesley 2011httpwwwworldcatorgoclc699702402

[11] R Yang andGWelch ldquoFast image segmentation and smoothingusing commodity graphics hardwarerdquo Journal of Graphics (GPUamp Game) Tools vol 7 no 4 pp 91ndash100 2002

[12] F D Igual R Mayo T D R Hartley U V Catalyurek A RuizandM Ujaldon ldquoExploring the GPU for enhancing parallelismon color and texture analysisrdquo in Proceedings of the in ParallelComputing FromMulticores and GPUs to Petascale Proceedingsof the conference ParCo B M Chapman F Desprez G RJoubert A Lichnewsky F J Peters and T Priol Eds vol 19pp 299ndash306 IOS Press 2009

[13] KOWGroupThe opencl specification version 22 2017 httpswwwkhronosorgopencl

[14] P Cozzi and C Riccio OpenGL Insights CRC Press 2012httpwwwopenglinsightscom

[15] M Bailey and S Cunningham ldquoComputer graphics shadersusing OpenGL 4Xrdquo in Proceedings of the ACM SIGGRAPHASIA 2010 Courses pp 1ndash173 Seoul Republic of Korea Decem-ber 2010

[16] M Bailey and S Cunningham Eds Graphics Shaders Theoryand Practice Taylor amp Francis 2nd edition 2011

[17] P Jeremias and I Quilez ldquoShadertoy Learn to create everythingin a fragment shaderrdquo in Proceedings of the SIGGRAPH Asia2014 Courses pp 1ndash15 New York NY USA December 2014

[18] M Reunanen Times of change in the demoscene A creative com-munity and its relationship with technology [PhD dissertation]University of Turku 2017

[19] D Hartmann Digital Art Natives Praktiken Artefakte undStrukturen der Computer-Demoszene Kulturverlag KadmosBerlin httpdigitalartnativesde

[20] H Marin-Vega G Alor-Hernandez R Zatarain-Cabada M LBarron-Estrada and J L Garcıa-Alcaraz ldquoA Brief Review ofGame Engines for Educational and Serious Games Develop-mentrdquo Journal of Information Technology Research vol 10 no4 pp 1ndash22 2017

[21] A L Turinsky E Fanea Q Trinh et al ldquoCAVEman Standard-ized anatomical context for biomedical data mappingrdquo Anato-mical Sciences Education vol 1 no 1 pp 10ndash18 2008

[22] R Ierusalimschy L H de Figueiredo and W C Filho ldquoLua-anextensible extension languagerdquo Software Practice and Experi-ence vol 26 no 6 pp 635ndash652 1996

[23] S Lantinga Simple DirectMedia Layer 2017 httplibsdlorg[24] O Cornut httpsgithubcomocornutimgui[25] K OW GroupThe opengl graphics system A specification (ver-

sion 45 (core profile) httpswwwkhronosorgregistryOpenGLspecsglglspec45corepdf

[26] C Everitt G Sellers J McDonald and T Foley httpswwwslidesharenetCassEverittapproaching-zero-driver-overhead

[27] Y Collet Zstandard 2015 httpzstdnet[28] H J Johnson M McCormick L Ibanez and T I S Con-

sortium The ITK Software Guide Kitware Inc 3rd editionhttpsitkorgItkSoftwareGuidepdf

[29] The HDF Group (1997-2017) Hierarchical data format version5 httpwwwhdfgrouporgHDF5

[30] httpsitkorgWikiITKFile Formats[31] S Barrett httpsgithubcomnothingsstb[32] F Ritter T Boskamp A Homeyer et al ldquoMedical image

analysisrdquo IEEE Pulse vol 2 no 6 pp 60ndash70 2011[33] C P Botha ldquoDeVIDE The delft visualisation and image proc-

essing development environmentrdquo Tech Rep Delft TechnicalUniversity 2004 httpsgraphicstudelftnlPublications-new2004BO04a

[34] R E Gabr G B Tefera W J Allen A S Pednekar and P ANarayana ldquoErratum to GRAPE a graphical pipeline environ-ment for image analysis in adaptive magnetic resonance imag-ingrdquo International Journal for Computer Assisted Radiology andSurgery vol 12 no 3 pp 459-459 2017

[35] A E Szalo A Zehner and C Palm ldquoGraphMICrdquo in Bildverar-beitung fur die Medizin 2015 Informatik aktuell pp 395ndash400Springer Berlin Heidelberg Berlin Heidelberg 2015

[36] W Schroeder K Martin and B Lorensen The VisualizationToolkit Kitware Inc 3rd edition 2004 httpwwwworldcatorgisbn1930934122

[37] I Wolf M Vetter I Wegner et al ldquoThe medical imaginginteraction toolkitrdquo Medical Image Analysis vol 9 no 6 pp594ndash604 2005

[38] M Nolden S Zelzer A Seitel et al ldquoThe medical imag-ing interaction toolkit Challenges and advances 10 years ofopen-source developmentrdquo International Journal for ComputerAssisted Radiology and Surgery vol 8 no 4 pp 607ndash620 2013

[39] W-K Jeong H Pfister and M Fatica ldquoMedical image process-ing using GPU-accelerated ITK image filtersrdquo GPU ComputingGems Emerald Edition pp 737ndash749 2011

[40] J Blanchette andM Summerfield C++ GUI Programming withQt 4 Prentice Hall 2006

[41] G van Rossum ldquoPython programming languagerdquo in Proceed-ings of the 2007 USENIX Annual Technical Conference J Chaseand S Seshan Eds Santa Clara CA USA 2007 httpswwwusenixorgpublicationsproceedingsf[0]=im group audience3A114

[42] A Polukhin Boost C++ Application Development CookbookPackt Publishing 2nd edition 2017

[43] B Dogdas D Stout A F Chatziioannou and R M LeahyldquoDigimouse a 3D whole body mouse atlas from CT and cryo-section datardquo Physics in Medicine and Biology vol 52 no 3 pp577ndash587 2007

[44] S Roettger httplgdvcsfaudeExternalvollib[45] P F Felzenszwalb and D P Huttenlocher ldquoDistance transforms

of sampled functionsrdquo Theory of Computing An Open AccessJournal vol 8 pp 415ndash428 2012

International Journal of Biomedical Imaging 9

[46] K E Batcher ldquoSorting networks and their applicationsrdquo inProceedings of the the April 30ndashMay 2 1968 spring joint com-puter conference p 307 Atlantic City New Jersey April 1968

[47] J Barczak OpenGL Is Broken The Burning Basis Vector [blog]2014 httpwwwjoshbarczakcomblogp=154

[48] T K V W Group Vulkan 1066 - a specification 2017 httpswwwkhronosorgvulkan

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 8: Instant Feedback Rapid Prototyping for GPU-Accelerated ...downloads.hindawi.com/journals/ijbi/2018/2046269.pdf · ResearchArticle Instant Feedback Rapid Prototyping for GPU-Accelerated

8 International Journal of Biomedical Imaging

[6] R P Broussard and R W Ives ldquoUsing a commercial graphicalprocessing unit and theCUDAprogramming language to accel-erate scientific image processing applicationsrdquo in Proceedings ofthe Conference on Parallel Processing for Imaging Applications2011 J D Owens I Lin Y Zhang and G B Beretta Eds SPIEProceedings San Francisco Airport California USA

[7] A Eklund M Andersson and H Knutsson ldquofMRI analysis onthe GPUmdashPossibilities and challengesrdquo Computer Methods andPrograms in Biomedicine vol 105 no 2 pp 145ndash161 2012

[8] F Dachille K Kreeger B Chen I Bitter and A E KaufmanldquoHigh-quality volume rendering using texture mapping hard-warerdquo in Proceedings of the 1998 ACM SIGGRAPHEURO-GRAPHICS Workshop on Graphics Hardware A E KaufmanW Straszliger G Knittel H Pfister and S N Spencer Eds vol 31Lisbon Portugal 1998

[9] M D Hanwell K M Martin A Chaudhary and L S AvilaldquoThe Visualization Toolkit (VTK) Rewriting the renderingcode for modern graphics cardsrdquo SoftwareX vol 1-2 pp 9ndash122015

[10] J Sanders and E Kandrot CUDA by example an introductionto general purpose GPU programming Addison-Wesley 2011httpwwwworldcatorgoclc699702402

[11] R Yang andGWelch ldquoFast image segmentation and smoothingusing commodity graphics hardwarerdquo Journal of Graphics (GPUamp Game) Tools vol 7 no 4 pp 91ndash100 2002

[12] F D Igual R Mayo T D R Hartley U V Catalyurek A RuizandM Ujaldon ldquoExploring the GPU for enhancing parallelismon color and texture analysisrdquo in Proceedings of the in ParallelComputing FromMulticores and GPUs to Petascale Proceedingsof the conference ParCo B M Chapman F Desprez G RJoubert A Lichnewsky F J Peters and T Priol Eds vol 19pp 299ndash306 IOS Press 2009

[13] KOWGroupThe opencl specification version 22 2017 httpswwwkhronosorgopencl

[14] P Cozzi and C Riccio OpenGL Insights CRC Press 2012httpwwwopenglinsightscom

[15] M Bailey and S Cunningham ldquoComputer graphics shadersusing OpenGL 4Xrdquo in Proceedings of the ACM SIGGRAPHASIA 2010 Courses pp 1ndash173 Seoul Republic of Korea Decem-ber 2010

[16] M Bailey and S Cunningham Eds Graphics Shaders Theoryand Practice Taylor amp Francis 2nd edition 2011

[17] P Jeremias and I Quilez ldquoShadertoy Learn to create everythingin a fragment shaderrdquo in Proceedings of the SIGGRAPH Asia2014 Courses pp 1ndash15 New York NY USA December 2014

[18] M Reunanen Times of change in the demoscene A creative com-munity and its relationship with technology [PhD dissertation]University of Turku 2017

[19] D Hartmann Digital Art Natives Praktiken Artefakte undStrukturen der Computer-Demoszene Kulturverlag KadmosBerlin httpdigitalartnativesde

[20] H Marin-Vega G Alor-Hernandez R Zatarain-Cabada M LBarron-Estrada and J L Garcıa-Alcaraz ldquoA Brief Review ofGame Engines for Educational and Serious Games Develop-mentrdquo Journal of Information Technology Research vol 10 no4 pp 1ndash22 2017

[21] A L Turinsky E Fanea Q Trinh et al ldquoCAVEman Standard-ized anatomical context for biomedical data mappingrdquo Anato-mical Sciences Education vol 1 no 1 pp 10ndash18 2008

[22] R Ierusalimschy L H de Figueiredo and W C Filho ldquoLua-anextensible extension languagerdquo Software Practice and Experi-ence vol 26 no 6 pp 635ndash652 1996

[23] S Lantinga Simple DirectMedia Layer 2017 httplibsdlorg[24] O Cornut httpsgithubcomocornutimgui[25] K OW GroupThe opengl graphics system A specification (ver-

sion 45 (core profile) httpswwwkhronosorgregistryOpenGLspecsglglspec45corepdf

[26] C Everitt G Sellers J McDonald and T Foley httpswwwslidesharenetCassEverittapproaching-zero-driver-overhead

[27] Y Collet Zstandard 2015 httpzstdnet[28] H J Johnson M McCormick L Ibanez and T I S Con-

sortium The ITK Software Guide Kitware Inc 3rd editionhttpsitkorgItkSoftwareGuidepdf

[29] The HDF Group (1997-2017) Hierarchical data format version5 httpwwwhdfgrouporgHDF5

[30] httpsitkorgWikiITKFile Formats[31] S Barrett httpsgithubcomnothingsstb[32] F Ritter T Boskamp A Homeyer et al ldquoMedical image

analysisrdquo IEEE Pulse vol 2 no 6 pp 60ndash70 2011[33] C P Botha ldquoDeVIDE The delft visualisation and image proc-

essing development environmentrdquo Tech Rep Delft TechnicalUniversity 2004 httpsgraphicstudelftnlPublications-new2004BO04a

[34] R E Gabr G B Tefera W J Allen A S Pednekar and P ANarayana ldquoErratum to GRAPE a graphical pipeline environ-ment for image analysis in adaptive magnetic resonance imag-ingrdquo International Journal for Computer Assisted Radiology andSurgery vol 12 no 3 pp 459-459 2017

[35] A E Szalo A Zehner and C Palm ldquoGraphMICrdquo in Bildverar-beitung fur die Medizin 2015 Informatik aktuell pp 395ndash400Springer Berlin Heidelberg Berlin Heidelberg 2015

[36] W Schroeder K Martin and B Lorensen The VisualizationToolkit Kitware Inc 3rd edition 2004 httpwwwworldcatorgisbn1930934122

[37] I Wolf M Vetter I Wegner et al ldquoThe medical imaginginteraction toolkitrdquo Medical Image Analysis vol 9 no 6 pp594ndash604 2005

[38] M Nolden S Zelzer A Seitel et al ldquoThe medical imag-ing interaction toolkit Challenges and advances 10 years ofopen-source developmentrdquo International Journal for ComputerAssisted Radiology and Surgery vol 8 no 4 pp 607ndash620 2013

[39] W-K Jeong H Pfister and M Fatica ldquoMedical image process-ing using GPU-accelerated ITK image filtersrdquo GPU ComputingGems Emerald Edition pp 737ndash749 2011

[40] J Blanchette andM Summerfield C++ GUI Programming withQt 4 Prentice Hall 2006

[41] G van Rossum ldquoPython programming languagerdquo in Proceed-ings of the 2007 USENIX Annual Technical Conference J Chaseand S Seshan Eds Santa Clara CA USA 2007 httpswwwusenixorgpublicationsproceedingsf[0]=im group audience3A114

[42] A Polukhin Boost C++ Application Development CookbookPackt Publishing 2nd edition 2017

[43] B Dogdas D Stout A F Chatziioannou and R M LeahyldquoDigimouse a 3D whole body mouse atlas from CT and cryo-section datardquo Physics in Medicine and Biology vol 52 no 3 pp577ndash587 2007

[44] S Roettger httplgdvcsfaudeExternalvollib[45] P F Felzenszwalb and D P Huttenlocher ldquoDistance transforms

of sampled functionsrdquo Theory of Computing An Open AccessJournal vol 8 pp 415ndash428 2012

International Journal of Biomedical Imaging 9

[46] K E Batcher ldquoSorting networks and their applicationsrdquo inProceedings of the the April 30ndashMay 2 1968 spring joint com-puter conference p 307 Atlantic City New Jersey April 1968

[47] J Barczak OpenGL Is Broken The Burning Basis Vector [blog]2014 httpwwwjoshbarczakcomblogp=154

[48] T K V W Group Vulkan 1066 - a specification 2017 httpswwwkhronosorgvulkan

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 9: Instant Feedback Rapid Prototyping for GPU-Accelerated ...downloads.hindawi.com/journals/ijbi/2018/2046269.pdf · ResearchArticle Instant Feedback Rapid Prototyping for GPU-Accelerated

International Journal of Biomedical Imaging 9

[46] K E Batcher ldquoSorting networks and their applicationsrdquo inProceedings of the the April 30ndashMay 2 1968 spring joint com-puter conference p 307 Atlantic City New Jersey April 1968

[47] J Barczak OpenGL Is Broken The Burning Basis Vector [blog]2014 httpwwwjoshbarczakcomblogp=154

[48] T K V W Group Vulkan 1066 - a specification 2017 httpswwwkhronosorgvulkan

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 10: Instant Feedback Rapid Prototyping for GPU-Accelerated ...downloads.hindawi.com/journals/ijbi/2018/2046269.pdf · ResearchArticle Instant Feedback Rapid Prototyping for GPU-Accelerated

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom


Recommended