+ All Categories
Home > Documents > Automatic Handling of Tissue Microarray Cores in High-Dimensional Microscopy Images

Automatic Handling of Tissue Microarray Cores in High-Dimensional Microscopy Images

Date post: 27-Jan-2017
Category:
Upload: marcial
View: 215 times
Download: 0 times
Share this document with a friend
9
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 18, NO. 3, MAY2014 999 Automatic Handling of Tissue Microarray Cores in High-Dimensional Microscopy Images M. del Milagro Fern´ andez-Carrobles, Gloria Bueno, Oscar D´ eniz, Senior Member, IEEE, Jes´ us Salido, Member, IEEE, and Marcial Garc´ ıa-Rojo Abstract—This paper describes a specific tool for automatically segmenting and archiving of tissue microarray (TMA) cores in microscopy images at different magnifications. TMA enables re- searchers to extract the small cylinders of a single tissue (core sections) from histological sections and arrange them in an array on a paraffin block such that hundreds can be analyzed simulta- neously. A crucial step to improve the speed and quality of this process is the correct localization of each tissue core in the array. However, usually the tissue cores are not aligned in the microar- ray, the TMA cores are incomplete and the images are noisy and with distorted colors. We develop a robust framework to handle core sections under these conditions. The algorithms are able to de- tect, stitch, and archive the TMA cores at different magnifications. Once the TMA cores are segmented they are stored in a relational database allowing their processing for further studies of benign- malignant classification. The method was shown to be reliable for handling the TMA cores and therefore enabling further large-scale molecular pathology research. Index Terms—High-dimensional image analysis, microscopy im- ages, tissue microarray (TMA) core segmentation, whole slide imaging. I. INTRODUCTION T HE tissue microarray (TMA) represents a powerful new technology designed to assess the expression of proteins or genes across the large sets of tissue specimens [1]. A TMA is an ordered array of up to several hundreds of small cylinders of single tissues (core sections) in a paraffin block from which sections can be cut and treated like any other histological sec- tion, using immunohistochemistry (IHC) for protein targets and in situ hybridization to detect gene expressions or chromosomal alterations [2], [3]. A TMA allows rapid and reproducible in- vestigations of biomarkers. The integration of TMA and clinical pathology data is emerging as a powerful approach to molecular profiling of human cancer [4], [5]. Manuscript received July 27, 2013; revised; accepted September 12, 2013. Date of publication September 20, 2013; date of current version May 1, 2014. This work was supported by the Spanish Research Ministry under Project DPI2008-06071. M. del M. Fern´ andez-Carrobles, G. Bueno, O. D´ eniz, and J. Salido are with the E.T.S.I.Industriales, Universidad de Castilla-La Mancha, 13071 Ciudad Real, Spain (e-mail: [email protected]; [email protected]; [email protected]; [email protected]). M. Garc´ ıa-Rojo is with Department of Pathology, Hospital General de Ciudad Real, 13701 Ciudad Real, Spain (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/JBHI.2013.2282816 Another use of the TMA is to provide random samples of a representative lesion, which may be evaluated by automated methods in order to achieve an objective diagnosis in pathology, which is nowadays one of the diagnostic laboratories with more human intervention (manual work), and more subjective assess- ment [6]–[8]. The high speed of scanning, the lack of significant damage to donor blocks, and the regular arrangement of scanned specimens substantially facilitates automated analysis [9]. However, working with TMA is difficult, both in data acqui- sition and in its management and interpretation. The use of IHC with TMA generates large amounts of information, which re- quires careful analysis. Currently, this analysis is done manually under the microscope, which besides being a tedious job that hinders the workflow, is subject to errors due to subjective in- terpretations of the specialists. The automatic analysis of TMA data and multicenter studies is still a challenge [2], [10]–[12]. The use of automatic acquisition systems for various digital imaging and tissue staining, as well as the development of tools for processing these images, will help to improve these difficulties. Another difficulty in TMA analysis is that usually the cores are neither aligned nor regular and they do not have enough tissue to be evaluated, besides the typical problems of digital images such as noise, distortion, etc. This may lead to lost cores in the detection process. Thus, there is a need to develop reliable tools to acquire, share, and assess microarrays and related data. At the moment, and as far as the authors know, only four sys- tems have been described in the literature for TMA handling. Three commercial tools are also available. The research works are of Della Mea et al. [10], Demichelis et al. [11], Shaknovich et al. [13] and Liu et al. [14], [6] and a preliminary report by Morgan et al. [15]. The commercial tools are TMALab (APE- RIO) [16], TMADesigner2 (ALPHELYS software) [17], and TAMEE [18]. The systems by Liu et al. [6] and Shaknovich et al. [13] are based on commercial products, particularly Microsoft Excel and Adobe Photoshop software together with some additional basic image processing tools. The main drawback is that they have been developed for a specific study. Besides, they work with low resolution images, i.e., magnification lower than 10×. Moreover, the TMA database provided by Liu et al. presents core images with overlapping regions. Thus, the image tiles are not stitched together but tiled. The system by Demichelis et al. [11] is web-based and accounts for patient data as well as biomarker experiment design data. The system provides a complete algorithm to find the best positioning of the cores nevertheless it is necessary to indicate previously the number 2168-2194 © 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
Transcript
Page 1: Automatic Handling of Tissue Microarray Cores in High-Dimensional Microscopy Images

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 18, NO. 3, MAY 2014 999

Automatic Handling of Tissue Microarray Coresin High-Dimensional Microscopy ImagesM. del Milagro Fernandez-Carrobles, Gloria Bueno, Oscar Deniz, Senior Member, IEEE,

Jesus Salido, Member, IEEE, and Marcial Garcıa-Rojo

Abstract—This paper describes a specific tool for automaticallysegmenting and archiving of tissue microarray (TMA) cores inmicroscopy images at different magnifications. TMA enables re-searchers to extract the small cylinders of a single tissue (coresections) from histological sections and arrange them in an arrayon a paraffin block such that hundreds can be analyzed simulta-neously. A crucial step to improve the speed and quality of thisprocess is the correct localization of each tissue core in the array.However, usually the tissue cores are not aligned in the microar-ray, the TMA cores are incomplete and the images are noisy andwith distorted colors. We develop a robust framework to handlecore sections under these conditions. The algorithms are able to de-tect, stitch, and archive the TMA cores at different magnifications.Once the TMA cores are segmented they are stored in a relationaldatabase allowing their processing for further studies of benign-malignant classification. The method was shown to be reliable forhandling the TMA cores and therefore enabling further large-scalemolecular pathology research.

Index Terms—High-dimensional image analysis, microscopy im-ages, tissue microarray (TMA) core segmentation, whole slideimaging.

I. INTRODUCTION

THE tissue microarray (TMA) represents a powerful newtechnology designed to assess the expression of proteins

or genes across the large sets of tissue specimens [1]. A TMAis an ordered array of up to several hundreds of small cylindersof single tissues (core sections) in a paraffin block from whichsections can be cut and treated like any other histological sec-tion, using immunohistochemistry (IHC) for protein targets andin situ hybridization to detect gene expressions or chromosomalalterations [2], [3]. A TMA allows rapid and reproducible in-vestigations of biomarkers. The integration of TMA and clinicalpathology data is emerging as a powerful approach to molecularprofiling of human cancer [4], [5].

Manuscript received July 27, 2013; revised; accepted September 12, 2013.Date of publication September 20, 2013; date of current version May 1, 2014.This work was supported by the Spanish Research Ministry under ProjectDPI2008-06071.

M. del M. Fernandez-Carrobles, G. Bueno, O. Deniz, and J. Salido are withthe E.T.S.I.Industriales, Universidad de Castilla-La Mancha, 13071 CiudadReal, Spain (e-mail: [email protected]; [email protected];[email protected]; [email protected]).

M. Garcıa-Rojo is with Department of Pathology, Hospital General de CiudadReal, 13701 Ciudad Real, Spain (e-mail: [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JBHI.2013.2282816

Another use of the TMA is to provide random samples ofa representative lesion, which may be evaluated by automatedmethods in order to achieve an objective diagnosis in pathology,which is nowadays one of the diagnostic laboratories with morehuman intervention (manual work), and more subjective assess-ment [6]–[8]. The high speed of scanning, the lack of significantdamage to donor blocks, and the regular arrangement of scannedspecimens substantially facilitates automated analysis [9].

However, working with TMA is difficult, both in data acqui-sition and in its management and interpretation. The use of IHCwith TMA generates large amounts of information, which re-quires careful analysis. Currently, this analysis is done manuallyunder the microscope, which besides being a tedious job thathinders the workflow, is subject to errors due to subjective in-terpretations of the specialists. The automatic analysis of TMAdata and multicenter studies is still a challenge [2], [10]–[12].The use of automatic acquisition systems for various digitalimaging and tissue staining, as well as the development oftools for processing these images, will help to improve thesedifficulties.

Another difficulty in TMA analysis is that usually the coresare neither aligned nor regular and they do not have enoughtissue to be evaluated, besides the typical problems of digitalimages such as noise, distortion, etc. This may lead to lost coresin the detection process. Thus, there is a need to develop reliabletools to acquire, share, and assess microarrays and related data.

At the moment, and as far as the authors know, only four sys-tems have been described in the literature for TMA handling.Three commercial tools are also available. The research worksare of Della Mea et al. [10], Demichelis et al. [11], Shaknovichet al. [13] and Liu et al. [14], [6] and a preliminary report byMorgan et al. [15]. The commercial tools are TMALab (APE-RIO) [16], TMADesigner2 (ALPHELYS software) [17], andTAMEE [18].

The systems by Liu et al. [6] and Shaknovich et al. [13]are based on commercial products, particularly Microsoft Exceland Adobe Photoshop software together with some additionalbasic image processing tools. The main drawback is that theyhave been developed for a specific study. Besides, they workwith low resolution images, i.e., magnification lower than 10×.Moreover, the TMA database provided by Liu et al. presentscore images with overlapping regions. Thus, the image tilesare not stitched together but tiled. The system by Demicheliset al. [11] is web-based and accounts for patient data as wellas biomarker experiment design data. The system provides acomplete algorithm to find the best positioning of the coresnevertheless it is necessary to indicate previously the number

2168-2194 © 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

Page 2: Automatic Handling of Tissue Microarray Cores in High-Dimensional Microscopy Images

1000 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 18, NO. 3, MAY 2014

Fig. 1. TMA core segmentation and archiving process. The tissue core detection and selection algorithms depend on the availability of a TMA thumbnail. Somewhole slide motorized microscopes and scanners do not provide a thumbnail, in which case the minimum magnification (5×) of the digital image is used. Red andgreen arrows are used to illustrate the process when using a thumbnail or a 5× image, respectively.

of rows and columns to the system. The systems by Morganet al. and Della Mea et al.are similar. They are web-based andopen source. Della Mea et al. introduce a website dedicatedto management, including archiving and retrieval. The systemincludes more tools for this management than the aforemen-tioned systems. However, it does not fully cover data analysisand automatic image processing. Finally, excepting the TMA-boost system, the rest of the aforementioned systems have notbeen demonstrated for high-dimensional images, that is, TMAdigitized at large magnifications.

The main drawback of the commercial tools is manual pro-cessing. In these tools, the user needs to define the number ofrows and columns on the TMA or select the corner core posi-tions manually (with the mouse). Besides, if there is an isolatedcore (a nonaligned core), the user needs to provide its locationto the tool, as in TMALab.

Thus, there is a need for a tool addressing automatic TMAanalysis including tissue core locations, segmentation, and rigidregistration of digital microscopic images acquired at differentmagnifications (5×, 10×, 20×, 20×, and 40×) from differentdevices. This is the aim of the present study, which describes theimplemented algorithms for the automatic detection and storageof the TMA cores and their pathology information for furtherstudies of their benign or malignant character. The algorithmswork for different whole slide scanning devices in pathology,which is motorized microscopes and scanners.

Section II describes the methods and materials used for thisstudy. This includes the segmentation procedure applied to theTMA images at different magnifications and the storage process.Section III describes the materials used, including the experi-mental database. Section IV describes the results obtained with

the proposed method and finally in Section V the main conclu-sions are drawn.

II. METHODS

The first objective of this study is the automatic segmentationof the TMA cores prior to subsequent archiving and process-ing. The segmentation is applied over TMA images acquired at5×, 10×, 20×, and 40×. This process includes the TMA coredetection, selection, and extraction. Once the TMA cores aresegmented they are archived. The archiving process preservesall the pathological information in a relational database for fur-ther classification of the selected cores. The whole segmentationprocess and the archiving process is illustrated in Fig. 1 and themethods are described as follows.

A. Detection

The algorithm developed for the detection of the tissue coresdepends on the availability of a thumbnail of the TMA. If athumbnail of the microscopic image is available, then the coresare detected on this image. In the case, where there is not athumbnail, the core images are captured at a magnificationof 5×. Some whole slide scanning devices do not provide athumbnail of the image, but the minimum digital image is at5× [19]. Then, the coordinates of each core are calculated for5×, 10×, 20×, and 40×. It must be kept in mind that the 5×images are larger than 500 MB and therefore some processingoperations may lead to memory errors. To avoid this, the 5× im-age is divided into six pieces and the algorithm is applied to the6 pieces. The detection algorithm is based on image processingmethods [20]. Thus, the algorithm proceeds as follows.

Page 3: Automatic Handling of Tissue Microarray Cores in High-Dimensional Microscopy Images

DEL MILAGRO FERNANDEZ-CARROBLES et al.: AUTOMATIC HANDLING OF TISSUE MICROARRAY CORES 1001

1) The color image is converted into a gray image.2) An erosion of five iterations with a 3×3 kernel is per-

formed in the image. Erosion, E(x,y) is done by meansof a convolution, where the minimum value of the neigh-borhood pixels are selected. This operation can eliminateartifacts of the input image, I(x,y), that may be consideredas cores, thus reducing the number of false positives.

3) (a) If there is a thumbnail image, then an adaptive thresh-olding is performed. The thresholding operation makes acomparison between the values of the images pixels andone threshold value. In the case of binary thresholding,when the value of the pixel I((x, y)) is greater than theestablished threshold value (T ), the new pixel (I ′(x, y))will take the maximum value (M ) (with M equals 225).On the contrary, if the value of the pixel (I(x, y)) is lowerthan the threshold value, then it will take value 0. Adaptivethresholding analyzes each pixel of the image with respectto their local environment. T is a threshold calculated indi-vidually for each pixel. In our case, the threshold value Tis the mean of a blockSize × blockSize neighborhood(95×95 in microscope thumbnail images and 75×75 inscanner thumbnail images) of I(x, y), minus a constant C(C is equal to 1).(b) If there is not a thumbnail image, then the followingthree operations are applied on the 5× image.

i) Template matching: Template matching uses a nor-malized correlation coefficient method. The corre-lation coefficient indicates the extent to which theinput template coincides with the image, as in (1),(2), and (3). The template, T (x′, y′), corresponds toa core sample obtained from one of the TMA at 5×.A perfect coincidence with the input template hasvalue 1, whereas no coincidence gives −1 and thevalue 0 indicates no correlation.

Rccoeff (x, y) =∑

x ′,y ′

[T ′(x′, y′) · I ′(x + x′, y + y′)]2

(1)where

T ′(x′, y′) = T (x′, y′) − 1(w · h)

∑x ′′,y ′′ T (x′′, y′′)

(2)and w and h are the image width and height,respectively.

I ′(x + x′, y + y′) = I(x + x′, y + y′)−

− 1(w · h)

∑x ′′,y ′′ I(x + x′′, y + y′′)

. (3)

The correlation coefficient method can be normal-ized by a factor Z(x,y) to obtain better results, see(4). This helps to reduce illumination differences be-tween the template and the image. Normalization isalways performed in the same manner.

Z(x, y) =√∑

x ′,y ′

T (x′, y′)2 .∑

x ′,y ′

I(x + x′, y + y′)2 .

(4)

Fig. 2. Core divided into four parts. Only the first and third parts are captured;then, the core is reconstructed by taking the width and height of the largest part,(w3 , h3 ), and extracting the same measure in the remaining parts.

The normalized correlation coefficient method istherefore represented as

Rccoeff norm(x, y) =Rccoeff (x, y)

Z(x, y). (5)

ii) Binary thresholding: Then, a binary thresholdingwith threshold T equal to 100 is applied to the 5×image.

iii) Morphological opening: To remove noise and falsepositive TMA cores, a morphological opening of sixiterations with a 3×3 kernel is applied.

4) Finally, a contour finding operator is applied to find thecore contours. This algorithm computes contours frombinary images like images created by a Canny operator,which have edges pixel in them, or images created by abinary thresholding, in which the edges are implicit asboundaries between positive and negative regions. Then,the algorithm retrieves contours from the binary imageusing the algorithm of Suzuki [21]. It can detect and extractthe contour pixels that divide each segment of the imageallowing to store them through sequences and in a waysuch that they can be later manipulated individually.

B. Tissue Core Selection

The cores that were tessellated into six pieces by the initialdivision of the 5× input image must be joined together. Thisjoin is carried out distinguishing two cases.

1) Cores divided into two parts: These cores can be classifiedinto two different forms, those who are divided by thex-axis and those who are divided by the y-axis.

2) Cores divided into four parts: These cores present a greaterdifficulty than the previous ones, because the contours ofthe four parts are not always found by the application. So,when at least one piece of the core is found, the core isextracted. Besides, to ensure that the core is extracted com-pletely, measurements of width and height of the largestpiece found are taken for extracting the other pieces of thecore; see Fig. 2.

Next, a selection of the detected cores is made before perform-ing the final extraction. Thus, those cores that do not fulfill therequirements to be considered as valid or doubtful are discarded

Page 4: Automatic Handling of Tissue Microarray Cores in High-Dimensional Microscopy Images

1002 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 18, NO. 3, MAY 2014

Fig. 3. Tissue core selection based on the amount of segmented tissue.

at this point. The conditions to consider a given core as validor doubtful are based on the amount of segmented tissue. Theseconditions are fulfilled if: 1) the percent of background pixelsmeasured after the thresholding is lower than 82% for a validcore and between 82% and 92% for a doubtful core and 2) thepercent of pixels with gray values similar to those of the airbubbles is lower than 12% for a valid core or is between 12%and 32% for a doubtful core. Air bubbles may be accidentallycreated within the paraffin block during TMA preparation. Thelatest condition is measured on a square region inscribed inthe core. Since in the case that a perfect core is detected onlythe 78.5% is tissue, (that is, the area of a circle inscribed in asquare), the first condition ensures that at least one fourth of thecore is detected, which is the minimum percentage needed tocarry out a diagnostic. In the case of a doubtful core betweenone fourth and one tenth of the core would be detected. In thisway, false positive detections due to noise and imperfections onthe glass slide (fragmented, dirt, glue, or air bubbles producedin the preparation process or missing core sections) are avoided.These conditions are illustrated in Fig. 3.

C. Tissue Core Positioning and Extraction

Once the TMA cores are selected, they go through the posi-tioning and extraction phase. For the extraction process, the min-imum bounding rectangle of the core is selected. This boundingbox indicates the position coordinates of the core inside thewhole TMA image. These coordinates allow to control the coreposition and enumerate the tissue cores. Positioning is donefrom the bottom to the top of the thumbnail being defined bythe y-coordinate of the upper left corner of each bounding box.Thus, the core with the lowest y-coordinate will be the first one.In this way, a row-major ordering methodology is used to storethe cores; see Fig. 4.(a). Our tool also provides the thumbnailimage with the enumerated cores; see Fig. 4.(b). In this way, thepathologist can easily locate each TMA core. The row-majorordering methodology driven by the y-coordinate has been usedto cope with paired-orientation alignment problems on the TMArows and columns; see Fig. 5.

It must be taken into account that when working with the 5×image, the positioning needs to be run twice. This is becausethe 5× image was previously divided into six pieces to avoidmemory errors. First separately for the top and the bottom ofthe 5× image and then together; see Fig. 6. The positioning

Fig. 4. Positioning of TMA cores in the thumbnail. (a) Positioning. (b) Finalthumbnail.

Fig. 5. Paired-orientation alignment problems on the TMA rows and columns.(a) Row problems. (b) Column problems.

Fig. 6. Positioning of TMA cores at 5×.

requires knowledge of all the cores in the 5× image; althoughthe initial image is divided into six pieces. This method savestime because images are loaded and unloaded less often.

Once the positioning is done, the tissue cores are extracted at5×, 10×, 20×, and 40×. The relationship that exists betweenthe pixels of the thumbnail of an image of one core and therest of the images at different magnifications is straightforward,because each image in ascending order is always double of theprevious one apart from the change between the thumbnail andthe 5×. The 5× image is eight times larger than the thumbnail,the 10× image is double of the 5× and so on. Table I shows intheir first column the pixels per micrometer (pixel/μm) in each

Page 5: Automatic Handling of Tissue Microarray Cores in High-Dimensional Microscopy Images

DEL MILAGRO FERNANDEZ-CARROBLES et al.: AUTOMATIC HANDLING OF TISSUE MICROARRAY CORES 1003

TABLE IRELATIONSHIP BETWEEN PIXELS AND MAGNIFICATIONS

Fig. 7. Difference between tiling and stitching. (a) Tiling or union of the tiles.It produces a badly reconstructed core with duplicated regions and (b) stitchingor rigid registration of the image tiles. It copes with overlapped regions producedby the scanning process. Thus, the duplicated regions shown in (a) are eliminatedin (b).

of the magnifications and the correspondence micron per pixel(μm/pixel) in the second column.

One of the main problems when extracting the tissue core at5×, 10×, 20×, and 40× is that they are not located in the sametile, but it is divided into several image tiles. Then, the imagetiles must be stitched together. The stitching is not a simpleunion of the image tiles but a rigid registration (see Fig. 7).

Thus, the stitching method consists of three stages [22]: 1)A description of the scene (image tiles) is generated throughgraph theory. Each tile is interpreted as a node and each node isconnected to their adjacent nodes through edges. Isolated nodesin the graph (nodes without neighbors in a radius of 2 tiles)are excluded. 2) The optimum overlap between fragments iscalculated. We explore the graph selecting pairs of nodes con-nected by an edge and an initial search is performed by a rigidregistration. The initial search calculates the mean absolute er-ror (MAE) over the tiles. For this purpose, we define a searcharea (usually 50% of the image size) and displace one of thetiles (called mobile tile) 4 pixels in each direction. The MAEis calculated in each displacement. The Intramodal Registrationdeveloped by Thevenaz and Unser [23] is used in this rigid reg-istration. This method uses three components: an interpolatorbased on a cubic beta-Spline model, a cost function based onsquare mean error, and a Levenberg–Marquardt optimizer. 3)Then, we calculate the global registration. In this stage, the dif-ferent fragments are combined with the different informationobtained in the previous stage. To that end, the initial graph istransformed in the minimum spanning tree by Kruskal’s algo-rithm. Adjacent tiles (nodes) may have variations in intensityfor this reason a blending is performed over the overlap zones.

After the stitching, the tissue cores at 5×, 10×, and 20× aresegmented from the image using the coordinates obtained in the

Fig. 8. Computational Time for TMA core extraction and archiving.

positioning process. The 40× image usually is not segmentedbut is built in a single image after the stitching.

D. Tissue Core Archiving

Each segmented core is saved as an individual image andtheir information is archived in a relational database. This al-lows further consultations or modification of their information.This part is really important in our framework because the spec-imen is identified with its core ID and its TMA ID and it canbe located in the TMA by (x,y)-coordinates. The core ID mayalso be identified in the enumerated thumbnail/5× image [seeFig. 4.(b)]. In this way, the database information follows thespecimen management standards to support the reporting pro-cesses in anatomic pathology (AP) laboratories for sharing orexchanging structured AP reports in which observations can beexplicitly bound to the whole slide image (WSI) or to regions ofinterest (ROI) in TMA core images [24]. Thus, it is importantto keep the original thumbnail coordinates of each core. Withthis, the pathologists not only will be able to recognize the corein the TMA, they will be also able to quickly find the core atthe microscope if they want to observe the TMA through themicroscope.

The database created is composed of six interrelated tables,where the main link is table cores_thumbnail_TMA, which hasinformation about the thumbnail image. This table has nine at-tributes: TMA ID, core ID (position of the core in the TMA),the validity of the core (a core can be considered as valid ordoubtful), core image, coordinates of the core in the thumbnailimage (the minimum bounding rectangle or bounding box), tis-sue type, stain, evaluation (its classification: type of malignancyor benignity), and comments (pathologist annotations about thiscore).

The other tables correspond to tissue cores belonging to theimages at 5×, 10×, 20×, and 40×. They have four attributes:TMA ID, core ID, its coordinates (the minimum boundingrectangle coordinates and the relationship with the thumbnail),and core image (at this magnification). The table about TMA

Page 6: Automatic Handling of Tissue Microarray Cores in High-Dimensional Microscopy Images

1004 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 18, NO. 3, MAY 2014

Fig. 9. Quantitative validation by means of ROC analysis. (a) ROC analysis for GIST TMA cores digitized with the microscope, and the scanner, (b) ROCanalysis for HE-BC TMA cores digitized with the microscope and the scanner.

Fig. 10. Type I and II errors in TMA core extraction. (a) Type I errors (FP).(b) Type II errors (FN).

information has two attributes: TMA ID and enumerated TMAthumbnail/5× image.

III. MATERIALS FOR IMAGE PROCESSING

The acquisition of the digital images has been carriedout using the robotized microscope ALIAS II and the Ape-rio ScanScope. Microscope ALIAS II (LifeSpan BiosciencesInc.) has lens for magnifications at 1.24× (thumbnail),5×, 10×, 20×, and 40×, an LED-type light source and a largeformat camera with a capacity of 2048 × 2048 pixels.

Four TMA datasets with a total of 21244 cores have beenprocessed. The datasets are as follows.

a) A dataset composed of 9 TMAs, 5 gastrointestinal stromaltumors (GISTs) with brown staining for KIT IHC, and 4of breast cancer stained with hematoxylin and eosin (HE-BC) prepared with a manual tissue arrayer composed of56 cores and digitalized with the motorized microscopeALIAS II at 5×, 10×, 20×, and 40×.

b) A dataset composed of 15 TMAs (14 GISTs and 1 of HE-BC) prepared with a manual tissue arrayer composed of56 cores/TMA and digitalized with Aperio ScanScope T2at 40×.

c) A database composed of 10 TMAs stained with IHCagainst D2-40, anti-CD34 antibodies and Alcian blue forangiogenesis research. This dataset was prepared with anautomatic tissue arrayer composed of 70 cores/TMA anddigitalized with Aperio ScanScope T2 at 20× and 40×.

d) A database composed of 384 TMAs stained withIHC against anti-CD123 antibodies with the method

ENDVISIONTM FLEX (DAKO) using a diaminobenzi-dine (DAB) chromogen for breast cancer analysis. Thisdataset was prepared with an automatic tissue arrayercomposed of 50 cores/TMA and digitalized with AperioScanScope T2 at 40×.

The TMA paraffin blocks were prepared at differentInstitutions with different biopsy core needle sizes and as afore-mentioned, with both manual and automatic tissue arrayer.Therefore, the datasets cover a full range of core sizes of 1, 1.5,and 2 mm diameter. Furthermore, the resolution and acquisitionmethod, due to the CMOS sensor size, is different for the scannerthan for the microscope. The scanner resolution (μm/pixel) is1.2 times higher than that of the microscope, i.e., 0.47 at 20× and0.23 at 40× for the scanner (see Table I). The CMOS size, for thedevices tested in this paper, is 3 × 2098 pixels and 2048 × 2048pixels for the scanner and microscope, respectively. Thus, theacquisition of microscopic fields is square-by-square, from theupper left corner to the lower right one. Thus, the final imageis a mosaic composed of multiple files of 2000x2000 each one.The Aperio ScanScope T2 uses a linear camera, where the ac-quisition file corresponds to a strip set of 1000 × D, where Dvaries between 72098 and 87891 pixels length.

The algorithms were implemented in C/C++ using In-tel’s Integrated Performance Primitives. The MySQL relationaldatabase was used for data storage, with MySQL Connector. Net6.2.2 was used to allow the connection of our code with thedatabase and MySQL Workbench 5.2 OSS to manage the devel-opment, administration, and creation of database diagrams.

IV. RESULTS

The datasets (a) and (b) composed of 1344 tissue cores havebeen used here to quantitatively assess the algorithms. Thesedatasets are the ones that present larger variations in tissue corein terms of distortions, alignment problems, and colors. This ismainly due to the use of the manual tissue arrayer for assemblingthe cores as well as the variety of stains.

Although the number of TMA cores was large, computa-tional times were not deemed excessive. Extracting each core at

Page 7: Automatic Handling of Tissue Microarray Cores in High-Dimensional Microscopy Images

DEL MILAGRO FERNANDEZ-CARROBLES et al.: AUTOMATIC HANDLING OF TISSUE MICROARRAY CORES 1005

different magnifications increases the computational time, themain problem is at large magnifications because the number oftiles to be joined to form a core is too high, on average, for 5×images is about 1 or 2 tiles, for 10× images is about 2 or 4 tiles,for 20× images is about 4 or 6 tiles and for 40× images is about8 or 12 tiles. Fig. 8 shows a scatter plot with the computationaltime, extracting all cores at different magnifications for bothTMA images digitized with the motorized microscope and thescanner. The core extraction and archiving algorithm takes onaverage 0.6 s for a thumbnail image (60 KB) and 5 min for a40× image (350 MB). Experiments were performed on an IntelCore i7, 3.06 Ghz and 12 GB RAM.

The quantitative validation is based on ROC analysis carriedout with both tissue samples, GIST and HE-BC. The results ofthe algorithms were compared to the manual selection of TMAcores, done by pathologists from the local Hospital (HGUCR).This consisted of a visual inspection of the 5× TMA wherethose cores suitable for diagnostic where selected. That is, coresnot distorted and with enough amount of tissue to carry out thediagnosis were selected by pathologists. Thus, the percent oftrue positive (TP), true negative (TN), false positive (FP), falsenegative (FN) detections, and the area under the ROC curve werecalculated. The ROC analysis for the thumbnail core extractionalgorithm with the motorized microscope and the scanner forboth tissue samples is shown in Fig. 9.

In the case of GIST samples digitized with the microscope anaverage of 1.38% detections were FN, 2.47% were FP, 98.62%were TP (sensitivity), 97.53% were TN (specificity) and the areaunder the ROC curve, Az , was equal to 0.915. In the case ofGIST samples digitized with the scanner an average of 0.84%detections were FN, 1.14% were FP, 99.16% were TP, 98.86%were TN and Az = 0.937. For HE-BC digitized with the micro-scope an average of 0.40% detections were FN, 4.27% were FP,99.60% were TP, 95.73% were TN and Az = 0.964. And in thecase of HE-BC digitized with the scanner, an average of 0.55%detections were FN, 0.63% were FP, 99.45% were TP, 99.37%were TN and Az = 0.967.

The largest type I error, which is FP with 4.27%, occurswith the HE-BC samples digitized with the microscope. Thelargest type II error, that is FN with 1.38%, occurs with theGIST samples digitized also with the microscope. Most of theFP errors occur due to pieces of detected core samples that areair bubbles, glue, or belong to other cores. Most of the FN errorsoccur due to misdetected TMA cores because of their very weakstain. This is illustrated in Fig. 10.

The accuracy (ACC) and Matthew’s correlation coefficient(MCC) were also calculated. The MCC is a correlation coeffi-cient between the truth values and detected ones; it is between−1 and +1. A coefficient of +1 represents a perfect prediction,0 an average random prediction, and −1 an inverse prediction.

ACC =(TP+TN)

(TP+FN+FP+FN)(6)

MCC =(TP ∗ TN) − (FP ∗ FN)√

(TP + FP)(TP + FN)(TN + FP)(TN + FN)(7)

Fig. 11. TMA core segmentation at different magnifications: 5×, 10×, 20×,and 40×. Results with the GISTs and HE-BC TMA database. (a) GIST TMAsample digitized with the motorized microscope. (b) HE-BC TMA sampledigitized with the motorized microscope. (c) GIST TMA sample digitized withthe scanner.

Both the ACC and MCC give good results. In the case of theTMA digitized with the motorized microscope, an average of98% ACC with 0.96 MCC is obtained for the GIST samplesand an average of 97% ACC with 0.95 MCC is obtained forthe HE-BC samples. In the case of the TMA digitized with the

Page 8: Automatic Handling of Tissue Microarray Cores in High-Dimensional Microscopy Images

1006 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 18, NO. 3, MAY 2014

Fig. 12. TMA core segmentation at 40×magnification. Results with the TMAdatabases stained with IHC against D2-40, anti-CD34 and anti-CD123 antibod-ies and digitized with the scanner. (a) Alcian blue TMA sample digitized withthe scanner. (b) TMA sample stained with IHC against anti-CD123.

scanner, an average of 98% ACC with 0.97 MCC is obtained forthe GIST samples and an average of 99% ACC with 0.98 MCCis obtained for the HE-BC samples.

The examples of core segmentation results for different GISTand HE-BC TMA samples digitized with both devices are shownin Fig. 11. On the whole database, an average of 98% accuracywas obtained. This result is encouraging if we take into accountthat most of the glass slides have imperfections, that is, weakstain, dark background, glue or air bubbles, and missed or dis-torted cores. Therefore, our system improves previous resultsreported in the literature such as TMABoost [11] which ob-tains 96.8% accuracy. Moreover, it is completely automatic asopposed to the commercial tools which require manual process-ing. The system was also tested with (c) and (d) databases com-posed by 700 and 19200 cores, respectively. Database (c) wasgenerated at INCLIVA Institute for angiogenesis research [25]and database (d) was generated at Hospital Verge de la Cintafor breast cancer research. The system uses a fixed parameterset for the core detection and selection processes. The param-eters of the selection process are however calculated for eachTMA, such as core size, percentage of background pixels, andpercentage of pixels with a gray value similar to those of the airbubbles. An average of 99% ACC was obtained for (c) and (d)databases. Some results are shown in Fig. 12. Thus, the validityof our tool is demonstrated.

V. CONCLUSION

This paper has described a specific tool to automaticallyperform the segmentation and archiving of TMA cores in mi-croscopy images at 5×, 10×, 20×, and 40×. The tool showspromising results in segmenting different microscopic imagesfrom TMA glass slides with different imperfections and imagequality.

A dataset of 1344 TMA cores composed of 1016 GIST and328 HE-BC core samples has been used to quantitatively test andillustrate the core extraction and archiving algorithm. The testscarried out show that the algorithm is both fast and accurate. Anaverage of 98% accuracy with 0.965 MCC and area under theROC curve of 0.946 is obtained for the cores digitized with bothWSI devices, a robotic microscope, and a scanner. Working witha scanner is not common. The software for using the scanner isclosed and proprietary so the interaction between a scanner andanother application may not be possible.

The tool improves upon previous systems described in the lit-erature which are not completely automatic. It addresses theproblem of handling automatically high dimensional micro-scopic TMA images at different magnifications. Furthermore, ithas been shown that our system, with a fixed parameter set, issuitable for segmenting different TMA samples stained with arange of biomarkers. The system is also flexible to work withTMA assembled by both manual and automatic tissue arrayer.

REFERENCES

[1] W. Chen, M. Reiss, and D. Foran, “A prototype for unsupervised analysisof tissue microarrays for cancer research and diagnostics,” IEEE Trans.Inf. Technol. Biomed., vol. 8, no. 2, pp. 89–96, Jun. 2004.

[2] R. Dell’Anna, F. Demichelis, A. Sboner, and M. Barbareschi, “An auto-mated procedure to properly handle digital images in large scale tissuemicroarray experiments,” Comput. Methods Programs Biomed., vol. 79,no. 3, pp. 197–208, 2005.

[3] D. Rimm, R. Camp, L. Charette, D. Olsen, and M. Reiss, “Tissue microar-ray: A new technology for amplification of tissue resources,” Cancer,vol. 7, no. 1, pp. 24–31, 2001.

[4] S. M. Dhanasekaran, T. Barrette, D. Ghosh, R. Shah et al., “Delineation ofprognostic biomarkers in prostate cancer,” Nature, vol. 412, pp. 822–826,2001.

[5] K. A. Kuraya, R. Simon, and G. Sauter, “Tissue microarrays for high-throughput molecular pathology,” Ann. Saudi Med., vol. 24, pp. 169–74,Jan. 2004.

[6] C. Liu, K. Montgomery, Y. Natkunam, R. West, T. Nielsen, M. Cheang,D. Turbin, R. Marinelli, M. V. de Rijn, and J. Higgins, “TMA-combiner,a simple software tool to permit analysis of replicate cores on tissuemicroarrays,” Mod. Pathol., vol. 18, pp. 1641–1648, 2005.

[7] D. Nohle, B. Hackman, and L. Ayers, “The tissue micro-array data ex-change specification: A web based experience browsing imported data,”BMC Med. Inf. Decision Making, vol. 5, no. 25, 2005.

[8] A. Rabinovich, S. Krajewski, M. Krajewska et al., “Framework for pars-ing, visualizing and scoring tissue microarray images,” IEEE Trans. Inf.Technol. Biomed., vol. 10, no. 2, pp. 209–219, Apr. 2006.

[9] T. Fuchs and J. M. Buhmann, “Computational pathology: Challenges andpromises for tissue analysis,” Comput. Med. Imag. Graph., vol. 35, no. 7–8, pp. 515–530, Oct.–Dec. 2011.

[10] V. Della Mea, I. Bin, M. Pandolfi, and C. D. Loreto, “A web-based systemfor tissue microarray data management,” Diagnost. Pathol., vol. 1, pp. 31–36, 2006.

[11] F. Demichelis, A. Sboner, M. Barbareschi, and R. Dell’Anna, “TMAboost:An integrated system for comprehensive management of tissue microarraydata,” IEEE Trans. Inf. Technol. Biomed., vol. 10, no. 1, pp. 19–27, Jan.2006.

Page 9: Automatic Handling of Tissue Microarray Cores in High-Dimensional Microscopy Images

DEL MILAGRO FERNANDEZ-CARROBLES et al.: AUTOMATIC HANDLING OF TISSUE MICROARRAY CORES 1007

[12] S. Stromberg, M. Bjorklund, C. Asplund, A. Skollermo et al., “A high-throughput strategy for protein profiling in cell microarrays using auto-mated image analysis,” Proteomics, vol. 7, pp. 2142–2150, 2007.

[13] R. Shaknovich, A. Celestine, L. Yang, and G. Cattoretti, “Novel relationaldatabase for tissue microarray analysis,” Arch. Pathol. Lab. Med., vol. 127,pp. 492–494, 2003.

[14] C. Liu, W. Prapong, Y. Natkunam, A. Alizadeh, K. Montgomery, C. Gilks,and M. Rijn, “Software tools for high-throughput analysis and archiving ofIHC staining data obtained with microarrays,” Amer. J. Pathol., vol. 161,no. 5, pp. 1557–1565, 2002.

[15] J. Morgan, C. Iacobuzio-Donahue, B. Razzaque, D. Faith, andA. D. Marzo, “TMAJ: Open source software to manage a tissue microarraydatabase,” in Proc APIII Meet., 2003.

[16] Aperio ePathology Solutions. TMALab Microarray Analysis Tool.[Online]. Available:http://www.aperio.com/pathology-services/analyze-tma-slides-software.as p

[17] ALPHELYS Integrated Solution for Pathology. TMADesigner2. [On-line]. Available:http://www.alphelys.com/alph01/prod/us/tmadesigner2-/tmadesigner2.php

[18] G. Thallinger, K. Baumgartner, M. Pirklbauer et al., “Tamee: data man-agement and analysis for tissue microarrays,” BMC Bioinformat., vol. 8,no. 81, pp. 1471–2105, 2007.

[19] M. Garcıa-Rojo, G. Bueno, C. Peces, J. Gonzalez, and M. Carbajo, “Crit-ical comparison of 31 commercially available digital slide systems inpathology,” Int. J. Surg. Pathol., vol. 14, no. 4, pp. 285–305, Oct. 2006.

[20] R. C. Gonzalez and R. E. Woods, Digital Image Processing, 3rd ed. En-glewood Cliffs, NJ, USA: Prentice–Hall, 2007.

[21] S. Suzuki and K. Abe, “Topological structural analysis of digitized bi-nary images by border following,” Comput. Vis., Graph., Image Process.,vol. 30, no. 1, pp. 32–46, Apr. 1985.

[22] C. Aguilar, M. Fernandez, J. Vidal, N. Vallez, O. Deniz, J. Salido, andG. Bueno, “Union automatica de imagenes microscopicas de alta res-olucion,” in Proc. Congreso Anual de la Sociedad Espanola de IngenierıaBiomedica (CASEIB), Nov. 16–18, 2011, pp. 10–17.

[23] P. Thevenaz and M. Unser, “User-friendly semiautomated assembly ofaccurate image mosaics in microscopy,” Microsc. Res. Tech., vol. 70,no. 2, pp. 135–146, Feb. 2007.

[24] Ch. Daniel, M. Garcıa Rojo, J. Klossa, V. Della Mea, D. Booker,B. Beckwith, and T. Schrader, “Standardizing the use of whole slide im-ages in digital pathology,” Computer. Med. Image Graph., vol. 35, no. 7–8,pp. 496–505, Jan. 2011.

[25] M. Fernandez, I. Tadeo, R. Noguera, M. Garcıa-Rojo, O. Deniz, J. Salido,and G. Bueno, “A morphometric tool applied to angiogenesis researchbased on vessel segmentation,” in Proc. 11th Eur. Congr. Telepathol., 5thInt. Congr. Virtu. Microsc., 2012, pp. 60–64.

M. del Milagro Fernandez-Carrobles received theGraduation degree in computer science and the Mas-ter’s in physics and mathematics from the Universi-dad de Castilla-La Mancha (UCLM), Ciudad Real,Spain, in 2010, and 2011, respectively.

She is currently working on VISILAB (MachineVision and Intelligence Systems Group) at UCLM.Her research interests include image processing, ar-tificial intelligence, and microscopic analysis.

Gloria Bueno received the M.Sc. in control engi-neering and physic science from Universidad Com-plutense de Madrid, Madrid, Spain, in 1993, and thePh.D. degree in machine vision from Coventry Uni-versity, Coventry, U.K. She has an experience work-ing as a Principal Researcher in several research cen-ters, such as Centre national de la recherche scien-tifique, Louis Pasteur University, Strasbourg, France,(1998–2000) and The Central European Institute ofTechnology, San Sebastian, Spain (2000–2002). Sheis author of 2 patents and more than 70 refereed pa-

pers. Her research interests include multimodality signal processing, parallelcomputer, and artificial intelligence.

Oscar Deniz received the M.Sc. and Ph.D. degrees incomputer science from Universidad de Las Palmas deGran Canaria (ULPGC), Las Palmas, Spain, in 1999and 2006, respectively.

He was an Associate Professor at ULPGC from2003 to 2007 and is currently at the Universidadde Castilla-La Mancha, Ciudad Real, Spain. Hismain research interests include signal processing andcomputer vision.

Dr. Deniz is a Research Fellow of the Institute ofIntelligent Systems and Numerical Applications in

Engineering, member of AEPIA, AERFAI. He was national finalist of the 2009Cor Baayen award to young researchers in computer science.

Jesus Salido received the Electrical Engineer degreeand the Ph.D. in robotics and artificial intelligencefrom the Universidad Politecnica de Madrid, Madrid,Spain, in 1989 and 1996, respectively.

From October 1996 to January 1999, he was aPostDoctoral Researcher (Visiting Scholar) at TheRobotics Institute, Carnegie Mellon University, Pitts-burgh, PA, USA. He has been an Associate Profes-sor since Feberuary 1999 at the School of ComputerScience, Universidad de Castilla-La Mancha, CiudadReal, Spain. He was a Consulting Engineer for more

than six years. His scientific research interests include intelligent systems andcomputer vision aplications.

Marcial Garcıa-Rojo received the Master’s in mi-crocomputer science in 1994, the Ph.D. degree inmedicine from Universidad Autonoma de Madrid,Madrid, Spain, in 1995.

He is leading the Department of Anatomic Pathol-ogy at the Hospital General Universitario de CiudadReal, Ciudad Real, Spain. He is the author of morethan 100 refereed papers. His research interests in-clude medical informatics and molecular pathologyof cancer, which has focused on the study of humanpapilloma virus in the cervical cancer and the molec-

ular biomarkers expression in cancer.


Recommended