Image Treatment Implementing Extended Depth of Field with NVIDIA® CUDA® M. Hernández Ariza, C. J. Barrios Hernández, A. Plata Gómez and D.A. Sierra Bueno
Universidad Industrial de Santander, Bucaramanga, Colombia
http://sc3.uis.edu.co
Extended depth of field (EDF) is a specific method used to analyze and treat specific image zones in optical research. Due to the complexity of the EDF and the possible large volume of data processed in optics problems, EDF is a good candidate to process in parallel architectures. From a large set of images taken through a microscope, we propose an implementation of parallel-extended depth of field addressed to massive parallel computing machines based on NVIDIA® GPUs. The proposed algorithms were implemented using NVIDIA® CUDA® and MPI with interesting performance results observed in terms of efficiency over different platforms maintaining the accuracy.
Image Processing with EDF
PARALLEL EDF
Images Stack Final Topographic Image
We use different plaforms: a GPGPU Cluster on Grenoble Informatics Laboratory named IDGraf with 6 NVIDA® TESLA® C2050 + 1 NVIDIA®
Geforce® GTX 295, 72 GB RAM and 2 Intel Xeon X5650. The second platform was a GPGPU Beowulf Cluster with GPU Nvidia® FX 570, 2GB RAM and 2 Intel 2.2. GHz QuadCore 2, and third, some regular nodes of Grid’5000 French Grid Platform and some generic nodes of GridUIS-2 plaform (Both cases without GPUs). The following results correspond to tests of images with a high resolution of 1920*2560 pixels.
Defined grid for NVIDIA® CUDA®, has a dimension equal to the image resolution (1920*2560 threads). Each thread on i-j position compute over each i-j element of the matrix. For MPI code, each process compute over all i-j elements of the matrix.
The idea to use different platforms is to observe behaviors on different possibilities. From generic nodes of beowulf clusters on Grid
Computing platforms to sofisticated GPU infrastructures and non-sofisticated resources. Results show best performance with NVIDIA® CUDA® implementation in all platforms. The main reason of this performance is the
possibility to launch the algorithm CUDA® kernel on parallel processing all i-j elements of the matrix simultaneously. On the other hand, MPI implementation limitations are increased by the communication cost among nodes.
Tests and Results
Parallel EDF
Computation of RGB components on each one of the stack images.
Array R
The variation of color intensity is computed for each image pixel observed the nearby cells to determine the zone with high focus degree. From the G matrix of each image, variance matrix is computed to identify the focused points of the image.
G Matrix (Enhanced)
With the variance matrix, a position matrix is computed to build the topographic matrix. In this matrix each i-j position corresponds to the position occupied on the image stack by the image with high intensity value for the i-j pixel. This procedure allows to obtain the topographic image or relief of the object scanned by the microscope.
Topography
Finally, R,G,B components are computed from each i-j element of the position matrix. This process allows to obtain the maximum color intensity values for each one of the components from the color intensity values of each one of the stack images Then, the final builded image is a focalized image in all axes.
R, G and B image components at the 9
stack position.
Topographic Image Focalized Image Render
Final Images
Jason Sanders, Edward Kandrot, CUDA by Example An Introduction to General-Purpose GPU Programming, Nvidia, Addison Wesley, Ann Arbor, Michigan, United States, First printing July 2010
Nvidia, NVIDIA CUDA Compute Unified Device Architecture, Programming Guide, Versión 1.1, 2007.
George Em Karniadakis and Robert M. Kirby II, Parallel Scientific Computing in C++ and MPI, A seamless approach to parallel algorithms and their implementation, Cambridge University Press,
Cambridge, Uneted Kingdom, First published 2003.
References
Variance Compute
00 01 02 03 04
10 11 12 13 14
20 21 22 23 24
30 31 32 33 34
40 41 42 43 44
00 01 02 03 04
10 11 12 13 14
20 21 22 23 24
30 31 32 33 34
40 41 42 43 44
00 01 02 03 04
10 11 12 13 14
20 21 22 23 24
30 31 32 33 34
40 41 42 43 44
Array G Array B
0 0 0 0 0 0 0
0 00 01 02 03 04 0
0 10 11 12 13 14 0
0 20 21 22 23 24 0
0 30 31 32 33 34 0
0 40 41 42 43 44 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 00 01 02 03 04 0
0 10 11 12 13 14 0
0 20 21 22 23 24 0
0 30 31 32 33 34 0
0 40 41 42 43 44 0
0 0 0 0 0 0 0
Array R
00 01 02 03 04
10 11 12 13 14
20 21 22 23 24
30 31 32 33 34
40 41 42 43 44
00 01 02 03 04
10 11 12 13 14
20 21 22 23 24
30 31 32 33 34
40 41 42 43 44
00 01 02 03 04
10 11 12 13 14
20 21 22 23 24
30 31 32 33 34
40 41 42 43 44
Array G Array B
5 9 3 4 10
2 6 8 1 6
9 1 10 3 5
4 8 2 7 9
10 6 9 3 5
Array R
00 01 02 03 04
10 11 12 13 14
20 21 22 23 24
30 31 32 33 34
40 41 42 43 44
00 01 02 03 04
10 11 12 13 14
20 21 22 23 24
30 31 32 33 34
40 41 42 43 44
00 01 02 03 04
10 11 12 13 14
20 21 22 23 24
30 31 32 33 34
40 41 42 43 44
Array G Array B
R, G and B elements of the final focalized
image
L. Camargo Forero and A. Lobo, engineers of Scientific and High Performance Computing Service at UIS (SC3-UIS). Professors B. Raffin (INRIA Rhône Alpes), O. Richard and Y. Denneulin (LIG
Laboratory and Grid’5000 Project). Also, Mr. M. Lansen of NVIDIA® Corporation and Optics and Signal Treatment Research Group at UIS (GOTS-UIS).
Experiments presented in this paper were carried out using the Grid'5000 experimental testbed, being developed under the INRIA ALADDIN development action with support from CNRS, RENATER and
several Universities as well as other funding bodies (see https://www.grid5000.fr) and the GridUIS-2 platform, being developed under the Universidad Industrial de Santander (UIS) High Performance and
Scientific Computing Service development action with support from UIS Vicerrectoria de Investigación y Extension (VIE-UIS) and several UIS research groups and academic unities as well as other funding
bodies (see https://grid.uis.edu.co) .
Acknowledgements