Histological image retrieval based on semantic content analysis

26 IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 7, NO. 1, MARCH 2003

Histological Image Retrieval Based on SemanticContent Analysis

H. Lilian Tang, Rudolf Hanka, Member, IEEE, and Horace H. S. Ip, Member, IEEE

Abstract—The demand for automatically recognizing and re-trieving medical images for screening, reference, and managementis growing faster than ever. In this paper, we present an intelligentcontent-based image retrieval system called I-Browse, whichintegrates both iconic and semantic content for histological imageanalysis. The I-Browse system combines low-level image processingtechnology with high-level semantic analysis of medical imagecontent through different processing modules in the proposedsystem architecture. Similarity measures are proposed and theirperformance is evaluated. Furthermore, as a byproduct of semanticanalysis, I-Browse allows textual annotations to be generated forunknown images. As an image browser, apart from retrieving im-ages by image example, it also supports query by natural language.

Index Terms—Histological image analysis, image annotation,medical image database, semantic analysis.

I. INTRODUCTION

M EDICAL images play a central role in patient diagnosis,therapy, surgical planning, medical reference, and

training. The development of systems for diagnosing, screening,archiving, and annotating based on automatic analysis of med-ical images are recurring research topics. In particular, withthe advent of digital imaging modalities such as computedtomography (CT), magnetic resonance imaging (MRI), andsingle-photon emission computerized tomography (SPECT)as well as images digitized from conventional devices such ashistological slides and X-rays, collections of medical imagesare increasingly being held in digital form. How to build upmedical databases and effectively use these sophisticated datafor efficient clinical applications is a challenging research issue.

In a conventional medical image database, most of the in-dexing and retrieval operations have been based on the patientidentity, date, and types of examination, image number or otherinformation contained in the image record. However, the in-formation at a higher level of abstraction, inherent in the im-ages, is far different from the kinds of representations that aresuitable for textual information. Moreover, as the use of mul-timedia in healthcare extends, more information could be uti-lized if image databases can be organized and retrieved basedon image content, especially at the semantic level. This would

Manuscript received July 2, 2001. This work was supported by the HongKong Jockey Club Charities Trust.

H. L. Tang is with the Department of Computing, University of Surrey, Guild-ford, Surrey GU2 7XH, U.K. (e-mail: [email protected]).

R. Hanka was with the Medical Informatics Unit, University of Cambridge,Cambridge CB1 2ES, U.K. (e-mail: [email protected]).

H. H. S. Ip is with the Department of Computer Science, City University ofHong Kong, Kowloon, Hong Kong (e-mail: [email protected]).

Digital Object Identifier 10.1109/TITB.2003.808500

also make possible the fusion and referencing of medical infor-mation extracted from different media sources.

Research in content-based image retrieval today is a livelydiscipline, expanding in breadth [1], as the access to visualinformation is not only performed at a conceptual level, usingkeywords as in the textual domain, but also at a perceptual level,using objective measurements of visual content [2]. There havebeen a large number of special issues dedicated to the topics ofcontent-based indexing and retrieval in recent years by manyleading journals. Content-based image indexing and retrievalhave commonly been based upon nonsemantic approachesemploying primitive image information like texture [3], shape[4]–[6], color [7], spatial relationships [8], or mixtures of thesefeatures [9]–[11] to facilitate the retrieval process. However,in many domain specific applications, like medical imagedatabases, the semantic content is more desirable because itfacilitates the high-level application tasks [12]. One key issueto be faced is the identification and extraction of semanticinformation from the visual data. One approach to solving thisproblem is to associate high-level semantic information withlow-level visual data. Several systems that attempt to bridge thisinformation gap can be found in [13]–[17]. A comprehensivesurvey by Smeulderset al. on content-based retrieval, the roleof the semantic gap and the challenges for such research canbe found in [1]. The work presented in this paper contributestoward semantic content analysis of medical images.

Research on analyzing images from the modalities of X-ray,CT, and MRI, etc., are numerous and active. An example likethe ASSERT system that is under development in Purdue Uni-versity [18] mainly focuses on the CT images of the lung. It isbased on a human-in-the-loop design in which a physician de-lineates the pathology bearing regions and a set of anatomicallandmarks in the image for the computer to extract attributes.A review on content-based indexing for medical images can befound in [19]. Compared with other types of medical images,histological image analysis is relatively rare. This is partly dueto the digital format of such data being relatively difficult to ob-tain. But mostly it is because histological images are far morecomplicated and diverse than other types of images in terms of,for example, the variations of colors, object appearances, and se-mantic interpretations at different magnifications. By focusingon a particular feature some successful results were reported bySchnorrenberget al. [20] for breast cancer nuclei detection, aswell as Hamiltonet al. [21] on distinguishing normal and ab-normal histological tissues. Schnorrenberg reported a 83% nu-cleus detection sensitivity compared to the experts’ result whileHamilton also claimed about 83% of test images were correctlyclassified as normal and abnormal tissues. Further research on a

1089-7771/03$17.00 © 2003 IEEE

TANG et al.: HISTOLOGICAL IMAGE RETRIEVAL BASED ON SEMANTIC CONTENT ANALYSIS 27

wider range of images and problems is still needed and it is stilla long way before we see the launch of fully automated systemsto assist clinical work. Recently, there is an ongoing project [22]developed in the University of Pittsburgh School of Medicinebuilding up a tissue banking information system, and a visualhospital project based on breast histological image analysis inQinetiQ in the U.K. [23].

This paper describes an approach and the techniques for se-mantic content-based retrieval of histological images. The re-sulting prototype system called I-Browse, not only enables auser, e.g., a physician, to search over image archives througha combination of iconic and semantic contents, but it also auto-matically generates textual annotations for input images. As asummary, I-Browse satisfies the following objectives:

1) to find similar images from the archive by image examplein terms of visual similarity;

2) to interpret visual properties as histological features in asimilar way to doctors;

3) to generate textual annotations for unknown images;4) to find similar images from the archive by image example

in terms of histological or semantic similarity;5) to retrieve images using natural language queries;6) to act as a postman or a classifier, i.e., put an unknown

image into a correct “pigeon hole,” i.e., tells where theimage is taken from along the gastrointestinal (GI) tract.

Histological images, like other types of medical images, fre-quently give rise to ambiguity in interpretation and in diagnosis.Medical images derived from a specific organ are similar visu-ally and usually differ only in small details but such subtle dif-ferences may be of pathological significance. Our claim is thatcurrent content-based image retrieval techniques using the prim-itive image characteristics such as texture, colors, and shapes,are insufficient for medical images. A major goal of the workreported in this paper is to demonstrate that semantic analysisof the image content plays a critical role in the understandingand retrieval of medical images. The techniques presented in thiswork can potentially be generalized for analyzing different typesof complex images through the integration of both low-levelimage analysis and high-level semantic reasoning.

Unlike other systems that normally look at several features inon a single organ, in this work we focused on a range of histolog-ical images originating from six organs along the GI tract. TheGI Tract is essentially a muscular tract lined by a mucous mem-brane, which exhibits regional variations in structure reflectingthe changing functions of the system from the mouth to the anusas shown in Fig. 1, which is adapted from [24, pp. 248]. Wemainly aim at six areas along the tract, i.e., esophagus, stomach,small intestine (small bowel), large intestine (large bowel), ap-pendix, and anus.

The rest of this paper is organized as follows. In Section II, weintroduce the definition of the semantic features and histologicallabeling. In Section III, we review the I-Browse architecture,and its major functional components. The processing cycle foriconic and semantic analysis is presented in Sections IV and Vrespectively. In Section VI, retrieval based on semantic contentand the associated similarity measures are introduced togetherwith an evaluation of the approach. Finally, we conclude thepaper in Section VII.

(a)

(b)

Fig. 1. Diagram of GI Tract. (a) Six organs along the GI tract. (b) Generalstructure of the tract.

II. DEFINING THE SEMANTIC FEATURES OF

HISTOLOGICAL IMAGES

The semantic interpretation ability in I-Browse means that thesystem is able to achieve the objectives 2–6 stated in Section I.For this reason, first of all we define what semantic units orsalient histological features need to be automatically extractedfrom the image through a series of image analysis operationsin the retrieval system. In consultation with histopathologists,we defined two sets of relevant histological features in GI tractimages, which are also called histologically labels or semanticlabels. These labels become the basic semantic units for pro-ducing high-level information in the system.

In this research, we focus on the images digitized at a mag-nification of 50. A human expert can identify most of theuseful histological features at this resolution. More than 1500images were digitized in our collection. Under a given magni-fication the histopathologist may see several levels of features.As shown in the schematic diagram in Fig. 1 (b) and the actualhistological images in Fig. 2(a)–(c), the GI canal is identifiedby its tubular nature and the division of its wall into five dis-tinct layers, namely, Lumen (L), Mucosa (M) including Mus-cularis Mucosae (MM), Submucosa (S, or SubM), Muscularis


Fig. 2. Image examples from GI tract. (a) small intestine, (b) stomach,(c) small intestine, and (d) image features examples at coarse and fine levels.

Externa (E or ME), and Serosa or Adventitia (A). For example,images in Fig. 2(a) and (b), which respectively are taken fromthe small intestine and stomach, demonstrate such structure. Wecall these features Coarse Features, which are applied to all theorgans along the GI tract. As seen in the diagram and the sampleimages in Fig. 2, even if there may be significant visual differ-ences between the images from the GI tract, no matter whetherthey are from the same or different organs, from the same ordifferent specimens, they all share similar color and cross pro-portion with respect to other coarse divisions. The perceivedspatial arrangement of the coarse feature regions provides anoverall structural description of the image content that is im-portant for the later reasoning procedures. We define ten basiccoarse regions as given in Table I(a), which include the five re-gions mentioned above and the junctions between their outerboundaries. There are five extra coarse features in Table I(a),which are derived from some of these ten coarse regions, e.g.,X is a combination of A and S, and Z means a feature appearseverywhere. This is to facilitate the definition of the fine levelof semantic features.

The fine level features are used to distinguish different visualappearances within each coarse region. In particular, we focusedon the following distinctive differences.

1) The difference of those fine features appearing in differentorgans like the villi in small intestine in Fig. 2(a) and thefundus glands in stomach in Fig. 2(b). In both imagesthese fine features (villi and fundus glands) belong to theirrespective same coarse region “Mucosa.”

2) The different appearance of the fine features in the sameor different images appearing in the same coarse regionfrom the same organ, e.g., the intestinal glands in Fig. 2(a)(the area where the name itself covers) and the villi re-gion (in oval area). They both belong to the coarse region“Mucosa.”

3) The varied appearances within the same histological fea-tures, e.g., the villi region in Fig. 2(a) and villi region inFig. 2(c). The reason that the villi details in (a) and (c)look so different, even if both images come from the sameorgan (small intestine), is mainly due to the cutting angle

TABLE I(a) COARSEFEATURE LABELS (b) PART OF THE63 FINE FEATURESLABELS

(a)

(b)

or caused by a twist of the tissues when preparing theslide. In this case, either their coarse structure arrange-ment [note the coarse feature distribution in Fig. 2(c)], orthe fine details in each region, is significantly different.

As a result of such analysis, two levels of histological mean-ingful interpretations of the images are defined: coarse featurelevel and fine feature levels. The principle of defining fine levelfeatures is not just to discriminate objects in the images in termsof histological meanings, but also to discriminate the visual vari-ation for the same objects. For example, the villi in Fig. 2(a) and(c) will be regarded as two kinds of fine features to facilitate abetter performance of the feature classifiers. Such features willbe regrouped later in the semantic analysis for the purpose ofgenerating the semantic content of the image. A fine feature canbe a common feature appearing in many coarse regions in any


Fig. 3. Process flow for feature sample selection and semantic labeling.

organ, such as blood, or can be a specific feature that only ap-pears in certain regions in an organ like small intestine intestinalglands. Currently we defined the semantics for 15 coarse fea-tures and 76 fine feature labels. Among the 76 fine features la-bels, in practice, it is sufficient to adopt 63 of them in the system.Coarse feature definition is shown in Table I(a) and examples ofthe 63 fine features are given in Table I(b). Examples of the map-ping between the visual appearances and their fine and coarsefeatures are given in Fig. 2(d). As a brief summary for the se-mantic features that we defined, a visual feature can be mappedto different levels of semantic meanings or terms and labels. Inthis system, each visual feature is mapped to two levels of terms,coarse and fine levels. Any fine level feature could be groupedor subsumed under one of the classes in the coarse level.

To facilitate the process of collecting sufficient salient vi-sual feature samples relating to the histological meanings atboth coarse and fine levels, a specially designed knowledgeelicitation subsystem was developed to enable a histopatholo-gist to interactively assign from a computer interface these his-tological feature labels (semantic labels) to a large set of thesubimages randomly selected from the GI tract image collec-tion (Fig. 3, the computer interface on the left). Fig. 2(d) showssome of the subimage samples selected by a histopathologist,which have been mapped to the corresponding semantic labels.These associations of histological labels with subimages de-picting the various visual characteristics of the labels were thentaken and stored as the ground truth and formed the initial set oftraining samples and testing data for designing the algorithmsfor feature extraction and the semantic label classifiers for thesubsequent image recognition processes. There were in total9049 subimage samples extracted from 274 images that wererandomly selected from about 1500 images in the collection.Among these subimage samples, 2737 were used for training.The rest were divided into two sets for testing. Fig. 3 showsthe knowledge elicitation module and the process flow of an-alyzing the visual and semantic features in unknown images.Details will be presented later.

III. I-B ROWSEARCHITECTURE ANDPROCESSINGCYCLE

To achieve the objectives for I-Browse, in particular to sup-port the extraction and fusion of iconic and semantic informa-tion from histological images, we proposed the I-Browse archi-

Fig. 4. System architecture.

tecture. It is composed of a set of disparate but complementarybuilding blocks as shown in Fig. 4; the visual feature detector,domain knowledge base (KB), index constructor, image data-base, semantic analyzer (SA), annotation generator and free textanalyzer. The semantic analyzer and the visual feature detectorsare the most essential and computationally intensive blocks inthe system. The relationships and the workflow between themare shown in Fig. 3.

After the associations of histological meanings at two of thelevels with the visual features found on a subimage, two corre-sponding visual detectors were designed to extract similar visualfeatures for unknown images. We call them coarse detectors andsemifine detectors. In the visual feature detector building blockof the I-Browse architecture, there are actually three types ofdetectors:

1) coarse detectors;2) semifine detectors;3) fine detectors.The first two are for general purpose and used to identify all

possible features in an unknown image either at a coarse or afine level. Presently the semifine feature detectors implementedin I-Browse are mainly detectors that capture the textural con-tent of the images and classify the subimages in unknown im-ages into those fine histological labels that have been defined inSection II. Although the detectors are designed to recognize thefine level histological features in the images, we only call them“semifine” detectors as they serve to produce an initial semanticlabeling of subimages that may need to be further confirmed bya subsequent semantic analysis process. As mentioned in Sec-tion II, since any fine feature can be grouped under a coarse levelfeature, the result from the semifine detectors can be mapped tohistological labels at the coarse level.

On the other hand, for coarse features, as seen in Fig. 2,since color characteristics in stained tissue images are promi-nent within these coarse structures, color histogram measure-ments were also used in conjunction with texture measurementsto differentiate distinctive coarse regions. So at the coarse fea-ture level, there are two independent sets of feature classificationresults based on the subimage texture and color characteristics.

The semantic analyzer serves to improve the accuracy ofthe semantic labeling by identifying potential incorrect assign-ments of histological labels to subimages by the visual detec-tors. This is achieved through a contextual analysis of these la-bels in concert with the relevant histological knowledge. Thereis an iteration loop between the visual feature detector and the


TABLE IILIST OF SPECIALIZED FINE FEATURE DETECTORSTHAT CAN BE INVOKED THROUGH THESEMANTIC ANALYZER

Fig. 5. Initial label map superimposed on a tissue image.

semantic analyzer with feedback information. Furthermore, thesemantic analyzer also triggers specialized fine feature detec-tors (the third type of the visual detector) designed to confirmor refute uncertain labels. Currently, 20 such primitive finefeature detectors were implemented in the system as listed inTable II. Explanations on the use of these fine detectors aregiven in Section V-B.

When a query image is submitted or during the populationprocess of image database, the input image is first partitionedinto a two-dimensional (2-D) array of subimages, the coarse andfine features for each subimage are recognized by the coarse andsemifine detectors. The initial result for the analyzed image isthree 2-D arrays of semantic labels, two of them at coarse leveland one at fine level. The parallel results from different detectorscan provide the semantic analyzer cues to find the potentially er-roneous detections. An example of these semantic labels super-imposed on an input tissue image can be found in Fig. 5. Withthese label arrays, the semantic analyzer iteratively analyzes andcorrects the labels according to the histological context in theknowledge base and may produce a set of hypotheses on the la-bels associated with subimages, if those labels are deemed erro-neously detected by the coarse and semifine detectors. Based onthe hypotheses, a number of fine feature detectors are invoked

to extract and confirm the visual features within the suspectedregions. This analysis and-detection cycle iterates until the se-mantic analyzer finds a coherent result and no further change isneeded. Details on visual feature detector and semantic analyzerare described in Sections IV and V.

The final label map is then used to construct the semanticcontent representation structure, Papillon, which will be used togenerate the textual annotation for the image in the database.“Papillon” is the codename for the internal semantic contentrepresentation used in I-Browse. It bridges information fromdifferent media (image and text), linking together the semanticanalyzer, Free Text Analyzer, and the Annotation Generator. Inthis system, the semantic content of an analyzed image is repre-sented in Papillon. On the other hand, when the query is madeby natural language, the Free Text Analyzer will extract the in-formation in the query and convert it into Papillon. Therefore,when the query is made either by image or text, their semanticcontent in Papillon will be used for the retrieval. Details aboutPapillon content, Free Text Analyzer, and Annotation Generatorare outside the scope of this paper and can be found in [25].

The system is written in the C++ programming language witha five-tier architecture [26] that allows modules in different tiersto be developed independently. It is integrated to an existingpatient diagnosis and documentation system, Pathos, which isextensively used in the pathology departments of the hospitalswhere our medical collaborators are from. Pathos is an elec-tronic patient record system for recording patients’ patholog-ical examination results, data, and diagnosis. I-Browse consistsof two front-end interfaces called PIMS and Retrieval, respec-tively, for inputting and retrieving images. The generated an-notation of the input image is stored in a Database ManagementSystem DB2, through the PIMS interface. The index constructorcreates iconic and semantic index content respectively from Vi-sual Feature Detector and Papillon. The two types of index con-tent serve different kinds of queries. The user may retrieve thedesired image by inputting a query image or query text and se-lecting either the semantic or the iconic similarity measurement.


IV. I CONIC ANALYSIS OF VISUAL APPEARANCES

The histological slides were obtained from the past eightyears patient’s record at a local hospital. From these slides thehistological images were captured under a high-resolution Leicamicroscope at 50 magnification. The image resolution wasset to the microscope maximum value of 44913480 pixelsduring capture process and then sampled down to 1123870pixels. We found that, through experiment, down-sampling theoriginal image by a factor of four still retains sufficient pixelsfor window-based color and texture analysis and significantlyreduces the computational load.

When the 1123 870 down-sampled image is divided into64 64 subimages, we get 1713 complete subimages. Inother words, we only take those pixels that belong to completesubimage squares in the subsequent analysis process andignore the small part around the four boundaries of the image.However, this does not mean that the analysis will miss thefeatures around the boundaries. Since the images were takenfrom patients’ specimens, the specimen on one glass slidenormally can produce many images. The digitization procedureallows some overlapping between images. Therefore the areaaround the boundaries of one image may appear in other imagesand be analyzed. We choose 6464 subimages as the basicprocessing unit because through experimentation we found thatit is appropriate for the coarse and semifine feature detectionsbased on Gabor filters.

For each subimage, coarse feature detectors examine thenormalized color and gray-level histograms of the subimages.The rotation invariant histogram features of each subimage arepassed to a three-layer neural network which assigns coarsefeature labels to the subimages. These can be any one of the 15coarse histological features as defined in Table I. In addition,another coarse label result can be obtained from the semifinedetectors whose fine feature detection result can be mappedinto one of the coarse features.

The semifine detector extracts texture measurements thatare based on a set of Gabor features, as well as the gray levelmean and standard deviations of the histogram normalizedsubimage. These semifine Gabor and statistical features de-tections based on multiple-size windows [27] are also appliedto the subimages. Except for the subimages corresponding tothe feature boundaries, whose semifine features are computedfrom the original 64 64 window size, the features of the othersubimages are computed in two different window sizes, 6464and 128 128. In our research, this multiple window approachhas been demonstrated to increase the accuracy rate of thehistological feature classification compared with the singlewindow approach [27]. The accuracy of this initial assignmentof histological labels to the test subimage sets ranges from 40%to 82% depending on the types of histological features. Detailsof the design of these feature detectors can be found in [27].

Fig. 5 shows an example of the histological labels assigned tothe subimages in an analyzed image. We called this labeling re-sult a label map. This label map defines the semantic content andthe spatial relationship of the various histological features foundon the image. The two letters and number in each subimage, re-spectively, represent two coarse feature results and the fine fea-

ture result for that sub-image. In Fig. 5, those subimages manu-ally marked “*” are erroneously assigned labels at the fine fea-ture level, and the subimages with “$” are examples where bothcoarse detectors assigning incorrect labels. Such mistakes areunderstandable as the feature detectors have been designed todiscriminate many types of features and each feature can actu-ally have many varied visual appearances. The variations aremainly caused by factors such as tissue cutting angle, tissuethickness, tissue deterioration, tissue orientation on the slide,slide preparation defect, patient disease, individual difference,digitization set up and staining, etc. However, many of these er-roneous labels can be corrected subsequently through the cycleof analysis-and-detection by the semantic analyzer with the helpof the knowledge base and fine feature detectors.

V. SEMANTIC ANALYSIS OF HISTOLOGICAL LABELS

A. Semantic Reasoning, Confusion Matrix, and ContextualKnowledge

The semantic analyzer is initially presented with three ma-trices (the label map) produced from the coarse and semi-finefeature detectors and has to identify potentially erroneous labelsand to correct for them. The most obvious cue for the semanticanalyzer is that the labels obtained from different visual detec-tors may conflict with each other and the labels produced fromthe same detector may also conflict with each other. In Fig. 5,the fine detector identifies some regions as appendix mucosa(represented as 13), stomach fundus glands (55), small intestineintestinal glands (46), anus epithelium (2), etc., which we knoware impossible histologically to appear in the same image. Inanother area, the color histogram suggests that they are mucosawhile the Gabor filter identified them as lumen.

The semantic analyzer must, therefore, reconcile mutuallyconflicting classifications. Given the label maps which containpossible erroneous labels, the semantic analyzer aims to do thefollowing.

1) Improve the accuracy of the recognition results usinghigh-level histological and contextual knowledge. Itmay also invoke fine detectors to confirm its reasoninghypotheses.

2) Analyze the semantic content of the whole image andrecording the semantic content in Papillon, which formsthe basis for generating a textual annotation for the image.

The analysis or reasoning process in the semantic analyzer isclosely correlated to the content in the knowledge base, whichincludes the prior knowledge of all semantic features at the twolevels about their legitimate visual and histological contextualattributes. Especially, the information generated from the con-fusion matrix is also recorded and updated in the knowledgebase. Confusion matrices which recorded performance statis-tics are computed from the training and testing procedures aswell as from previous performance of the detectors on data. Thisgives us information on how much any two detected featuresare similar to each other. For instance, among the 90 cases ofdetection for adipose tissue, the detector recognized it as adi-pose tissue only three times. For the rest of the cases, the de-tector recognized it as hair follicle once, as connective tissuetwice, as lumen 19 times, as muscularis mucoase three times,


and ten times as the junction between lumen and stomach fove-olae, and so on. Such information is then summarized into theknowledge base, either at coarse level or fine level, a set of pos-sible confusion candidates based on the relationships of theseconfused features in descending order according to the similarpercentage, describing how similar any two features are, or howmany and how much other features are similar to the given fea-ture. The system refines the knowledge base through the classi-fication process as the confusion matrices are updated.

When there are more than one detector for recognizing onetype of features, in our system the case is for detecting the fea-tures at coarse level, the semantic analyzer forms its own prim-itive label matrix according to the relative accuracy of the dif-ferent detectors for a given class as recorded in the confusionmatrixes for the detectors. The analysis starts from those subim-ages, which have a high probability of being correct such asLumen, and gradually expands the scope of analysis based onthe previous analyzed labels. When reasoning using contextualknowledge, the confusion candidates are compared and chosenfor correcting the wrong labels if they are more coherent withthe context. The analyzer estimates which organ along the GItract that the image originated from. This estimate gives a list offeatures that are histologically consistent with this organ. If anyfeature label that is inconsistent with this organ are found on thelabel map, another feature which is consistent with this organ ischosen from its confusion candidates. If this fails, it choosesthe next most similar confusion candidate which can appear inany organ and is coherent with the context. If all these fail, i.e.,cannot find the above two kinds of similar features, then changethe label to the majority label among its neighbors.

There are many different operations involved in the analysisprocedure, e.g.: 1) neighbor analysis and region analysis; 2)region searching and grouping; 3) finding a canonical axisof the image content which defines how the image shouldbe viewed. This information forms the basic orientation ofthe image which gives the reference for describing spatialrelationships; 4) finding boundaries of histological objects; and5) spatial relationship analysis, etc. This is an iterative processwhere the subimages interact with, support or refute each other,and finally come up with a more coherent and consistent labelmatrix. We call the above procedure semantic reasoning.

B. Analysis and Detection

During the reasoning, when the semantic analyzer needs toconfirm certain features in the region, it invokes one or more finedetectors to carry out specific analysis. To do this, the semanticanalyzer generates a list of feature hypotheses, which happenswhen we have the following conditions.

1) A confusion candidate is chosen from the knowledgebase. For example, if the semantic analyzer has identifiedthat the image comes from esophagus, but in an area thesubimage was detected as appendix muscularis externa.From appendix muscularis externa’s confusion candidatelist, esophagus epithelium is the next similar feature tothe appendix muscularis extern and it is coherent withthe context. The semantic analyzer will change the label

to esophagus epithelium tentatively and then trigger finedetector Oesophagus: epithelium (no. 10 in Table II).

2) No suitable confusion candidate is available from theknowledge base.

3) The semantic analyzer has very weak confidence aboutthe confusion candidates. To judge whether a candidateis weak or is not is based on the similar percentage that iscalculated from the confusion matrices. In the case of 2)and 3), the semantic analyzer trigger more fine detectorsaccording the context including the fine detector for theanalyzed feature itself.

4) The semantic analyzer expects certain features should bepresent but they have not been recognized within the labelmap. This happens when two features normally appeartogether while only one has been confirmed.

The fine feature detectors are a set of detectors specially de-signed to examine the visual properties of particular fine fea-tures that may need to be further confirmed. The design prin-ciple of these detectors is based on spatial features such as shape,structure, and spatial relationship. A set of morphological pa-rameters such as shape, contour, distance to the lumen, neighborconfiguration, etc. were used in these fine detectors. The ac-curacy rates of such detectors are high, but they require muchmore computation than the statistical coarse and semifine fea-ture detections and, hence, should only be invoked on-demand.At present, 12 main fine feature detectors, which is subdividedinto 20 fine feature detectors as listed in Table II, have beenimplemented. When invoking a specialized feature detector, thesemantic analyzer passes at the same time a region of interest(ROI) to the detector which subsequently returns a confidencevalue to the semantic analyzer. In the end, the semantic analyzercompares the returned confidence values from the different de-tectors to verify the hypotheses. This process of analysis anddetection may go through several iterations before coming upwith a stable result.

As an example, when the semantic analyzer hypothesizes thatthe region is probably an anus epithelium, it will ask the anusepithelium detector to compute the confidence value of the hy-pothesis. The detector will start from a Gaussian filtered imageand combine the ROI to form a binary image according to apreset threshold value. Based on the binary image, the detectorextracts every isolated island (groups of connected pixels) andexamines its color content, neighbor intensity, size, the distanceto the lumen as well as its boundary to the lumen. All these pa-rameters have different weighting to the confidence value. Afterthe computation, it passes the confidence value back to the se-mantic analyzer. Since the semantic analyzer may request sev-eral detectors to verify the hypotheses at the same time, in thiscase, the semantic analyzer will select the one with highest con-fident value.

The fine feature detectors were developed using patternrecognition techniques that are suited for the particular imagefeatures. As shown in Fig. 6, the fine detector can extract mostof the Anus Epithelium which is indicated with letter “A” (lightcolor). The amount of missing Anus Epithelium area indicatedby letter “B” (dark color) is acceptable when compared tothe whole epithelium layer. On the other hand, the detectorrules out those areas not belonging to the anus epithelium.


Fig. 6. Extracted Anus Keratinised Squamous Epithelium (A) and missingarea (B) by fine feature detector.

This detector was designed using five characteristics, whichare: 1) distance of the epithelium layer to the Lumen; 2) theepithelium intensity; 3) the intensity of the epithelium layerat the Lumen boundary; 4) the tissue intensity near to theepithelium layer (excluding the Lumen area); and 5) the edgemagnitude at the boundary of the epithelium layer with othertissues except the Lumen. To demonstrate the efficiency ofthe Anus Epithelium detector, we ran an experiment based ona small database containing 90 images, the accuracy rate foridentifying anus epithelium was 76.9%. It should be noted thatthese specialized detectors are not applied to detect the featuresat the start of analyzing an image. This is not only becausethey are normally time-consuming, but also strategically it isnot good to generally apply many specialized detectors at thebeginning to an unknown image given the images in the largecollection may vary greatly.

C. Final Label Map and Semantic Content

The semantic analyzer finally delivers two types of output,one is the final label map, and the other is the semantic contentas represented in Papillon. Fig. 7 shows the final label map re-sult superimposed on the same image in Fig. 5. Fig. 9 showsa section of the automatically generated Papillon structure forthe semantic content of the image in Fig. 8, where key featuresrecognized in Papillon are illustrated. In this example, the se-mantic content is represented in a forest structure, holding therelationships between features and information about features,like where the image comes from (appendix in this example),color, shape, size, quantity, spatial relationships, and locationsof the features, the axis, and about how to view the image, etc.The semantic content of the whole image information is muchmore exhaustive than the sample content shown here.

D. Evaluation of the Semantic Reasoning Approach

To evaluate the advantage of using semantic reasoning to sup-plement visual content analysis in tissue image interpretation,2957 histological subimages or feature units were selected atrandom from the images and we examined how these subimageswere classified into 63 fine features. By comparing the subim-ages before reasoning and after reasoning, there were 1042 fea-tures successfully corrected by the semantic analyzer from thetotal of 2957 features, which means the accuracy of the wholeimage feature extraction is improved by 35.2%. 1663 features,about 56.2% remained correct; 205 features, about 6.9% re-mained wrong, and 47 features, about 1.6% changed from beingcorrect into incorrect.

Fig. 7. Final label map for the image in Fig. 5 after semantic analysis.

Fig. 8. Sample image to illustrate some key histological features.

Fig. 9. Part of the Papillon structure generated by the semantic analyzer forthe sample image shown in Fig. 8.

Since the semantic analysis is based on the results from otherengines, the performance of the semantic analyzer can be im-proved incrementally by: 1) implementing more fine feature de-tectors; 2) improving the performance of those existing visualfeature detectors; and 3) improving the descriptive power of theknowledge base by adding more knowledge details for certainfeatures.

VI. USING SEMANTIC CONTENT IN IMAGE RETRIEVAL

Three associated similarity measurement were designed tocompare most frequent semantic labels, local neighbor patternof semantic labels and semantic label frequency distribution.Most frequent semantic label (MFSL) is based on the 15 coarse


histological regions of the image, which in fact roughly de-scribes how the coarse features like Lumen, Mucosa, Submu-cosa, junctions, etc., are distributed in the image. MFSL is de-fined as below, whereis a constant equal to 0.75 in this exper-iment.

SimilarityFor all windows dowinSimilaritySort coarse labels of retrieved images byfrequency.For each coarse label of the retrievedimageif coarse label is the same as the most

frequent coarse region of the query imagewinSimilarity winSimilarity

End For LoopSimilarity winSimilarity

End For LoopMFSL Similarity number of windows

Neighborhood Similarity (NS) uses a matrix to record theco-occurrence frequencies of the 63 histological labels of theeight nearest neighbors against those of the center subimage.Each element of the matrix records how many times thecenter subimage being histological labelwhile the histologicallabel of any one neighbor is equal to. Since there are 63 histo-logical labels employed in the system, the matrix size is 6363

(1)

similarity (2)

similarity(3)

Where and are the numbers of subimages labeled asLumen and the total number of subimages respectively in animage; is the total number of fine histological labels,is the number of subimages which have eight connected neigh-bors. Finally, and are the scaling factors to eliminate theinfluence of the number of Lumen in the query and retrieved im-ages, respectively. The scaling factor is included in the equationin order to reduce the influence of the noncharacteristic Lumenfeature in any image.

Semantic label frequency distribution similarity (SLDS) di-rectly counts the frequency of the 63 fine histological labels oc-curring in the image. For each image, the system only needs tocompare the 63 entries, therefore the computation time is muchshorter than the NS. SLDS is defined as follows:

similarity (4)

SLDSsimilarity

(5)

Fig. 10. Retrieval interface of the I-Browse system.

Where and are, respectively, the frequencies offine histological label for the query and retrieved images,and have the same meaning as in the NS measure.

An experiment was carried out to test the accuracy of thesemeasurement. The accuracy rates using NS and SLDS are,respectively, 78% and 80%, higher than MFSL which actuallyuses less semantic information in the measurement. The re-trieval accuracy rate depends on the accuracy of the histologicallabel detection as well as the design of similarity functions.Another important factor in image retrieval is computationtime. Based on a dual Intel PII-450 computer, the computationtime of label analysis is 5.32 s. The similarity comparison timeof each image to another is, respectively, equal to 6 ms, 0.23 s,4.3 ms, for MFSL, NS, and SLDS. It is obvious that SLDS sofar is the best choice among all current similarity measures.

As well as the analyses discussed in the paper, in order todemonstrate the advantage of integrating semantics into thesystem, we also compared I-Browse with the QBIC systemdeveloped by IBM [9], which was designed as a generalcontent-based retrieval system. We used a collection of 19 890subimages with 2873 from anus, 3536 from appendix, 3094from large intestine, 3536 from esophagus, 3757 from smallintestine, and 3094 from stomach. In I-Browse NS and SLDSmeasures were used while in QBIC Color, Texture, and Spatialmeasures were chosen in this experiment. The accuracy rate ofthe similarity measures in I-Browse on the average can be up to34.3% higher than the three measures in QBIC within the firstfive retrieved images. I-Browse performed better than QBIC inthis benchmark test, and we attribute this primarily to the useof domain knowledge to improve the semantic interpretation.

The Retrieval interface is shown in Fig. 10, where a queryimage of Stomach is submitted and displayed at the left topcorner of the figure. The top five images most similar to thequery image are displayed on the right hand column of thewindow. The most similar image except the query image itselfis placed at the bottom left corner in Fig. 10. Moreover, ifthe user moves the mouse to any subimage, a small yellowtextual annotation tag will be shown next to the subimagedescribing this subimage (not displayed in Fig. 10). If the rightmouse button is clicked, a new textual annotation window that


describes the content of the whole image will be displayed inthe interface. It is placed at the right top corner in Fig. 10.

VII. CONCLUSION

The major contributions in the I-Browse prototype are: theassociation of semantic meanings and visual properties; the in-tegration of syntactic visual analysis and semantic reasoning inconnection with the contextual knowledge; and the design ofPapillon which fuses the information in different media to fa-cilitate intelligent application tasks. As a long term target, weaim to further develop this approach so that we can automat-ically classify a wider range of histological images, identifynormal and abnormal images, and ultimately use it as an au-tomatic screening tool. It has also been recognized by the con-sultant pathologists that the annotation functionality providedin I-Browse is useful as a doctors’ reference and reminder evennow. Although I-Browse is a specialized system for GI tract im-ages, the architecture and the analysis engines in the systemhave been designed so that they can be generalized, not onlyfor other types of histological images. As the general systemarchitecture of I-Browse is domain independent, to apply theapproach to other applications, we can simply follow the de-sign procedure of identifying a set of meaningful semantic la-bels for the particular domain and then train the appropriate setof feature detectors to identify the semantic labels in images.Semantic analysis is then carried out according to the set of do-main specific rules. We are currently planning to extend the ap-proach to a diversity of other application domains including anart gallery of oriental paintings, and a library of herbarium spec-imens.

ACKNOWLEDGMENT

The authors thank their medical collaborators: Dr. K. C. Lee,Consultant Pathologist and Chief of Service, Princess MargaretHospital, Hong Kong; Dr. E. Sims, Consultant Pathologist,Royal Bolton Hospital, Bolton, U.K., and Dr. J. Rashbass,Consultant Pathologist, Addenbrooke’s Hospital, Cambridge,U.K. The authors thank the cooperation of Dr. R. Lam andDr. K. Cheung in developing the visual feature detectors andthe software framework for this work.

REFERENCES

[1] A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain,“Content-Based image retrieval at the end of the early years,”IEEETrans. Pattern Anal. Machine Intell., vol. 22, pp. 1348–1381, Dec. 2000.

[2] C. Colombo, A. Del Bimbo, and P. Pala, “Semantics in visual informa-tion retrieval,”IEEE Multimedia, pp. 38–53, July–Sept. 1999.

[3] S. R. Fountains and T. N. Tan, “Efficient rotation invariant texture fea-tures for content-based image retrieval,”Pattern Recogn., vol. 31, no.11, pp. 1725–1732, 1998.

[4] R. Singh and N. P. Papanikolopoulos, “Planar shape recognition byshape morphing,”Pattern Recogn., vol. 33, pp. 1683–1699, 2000.

[5] A. Del Bimbo, M. De Marsico, S. Levialdi, and G. Peritore, “Queryby dialog: An interactive approach to pictorial querying,”Image VisionComput., vol. 16, pp. 557–569, 1998.

[6] D.-H. Lee and H.-J. Kim, “Fast content-based indexing and retrievaltechnique by the shape information in large image database,”J. Syst.Soft., vol. 56, no. 2, pp. 165–182, Mar. 2001.

[7] T. Geversa and A. W. M. Smeulders, “Content-based image retrieval byviewpoint-invariant color indexing,”Image Vision Comput., vol. 17, pp.475–488, 1999.

[8] A. F. Abate, M. Nappi, G. Tortora, and M. Tucci, “IME: An imagemanagement environment with content-based access,”Image VisionComput., vol. 17, pp. 967–980, 1999.

[9] IBM Ultimedia Manager System [Online]. Available: http://www/qbic.almaden.ibm.com/~qbic/qbic.html

[10] J. Laaksonen, M. Koskela, S. Laakso, and E. Oja, “PicSOM: Content-based image retrieval with self-organizing maps,”Pattern Recogn. Lett.,vol. 21, pp. 1199–1207, 2000.

[11] M. S. Kankanhalli, B. M. Mehtre, and H. Y. Huang, “Color and spatialfeature for content-based image retrieval,”Pattern Recogn. Lett., vol. 20,pp. 109–118, 1999.

[12] A. Martinez and J. R. Serra, “Semantic access to a database of images:An approach to object-related image retrieval,” inProc. 1999 6th Int.Conf. Multimedia Comput. Syst., vol. 1, 1999, pp. 624–629.

[13] J. M. Corridoni, A. Del Bimbo, and E. Vicario, “Image retrieval by colorsemantics with incomplete knowledge,”J. Amer. Soc. Inform. Sci., vol.49, no. 3, pp. 267–282, Mar. 1998.

[14] G.-H. Cha and C.-W Chung, “Indexing and retrieval mechanism forcomplex similarity queries in image databases,”J. Visual Commun.Image Representation, vol. 10, no. 3, pp. 268–290, 1999.

[15] A. Jaimes and S.-F. Chang, “Conceptual framework for indexing visualinformation at multiple levels,” inProc. SPIE, vol. 3964, San Jose, CA,Jan. 28, 2000, pp. 2–15.

[16] A. Vailaya, A. Jain, and H. Zhang, “On image classification: City imagesvs landscapes,”Pattern Recog., vol. 31, no. 12, pp. 1921–1935, 1998.

[17] W. Al-Khatib, Y. F. Day, A. Ghafoor, and P. B. Bruce, “Semanticmodeling and knowledge representation in multimedia databases,”IEEE Trans. Knowledge Data Eng., vol. 11, pp. 64–80, Jan.–Feb. 1999.

[18] C. R. Shyu, C. E. Brodley, A. C. Kak, A. Kosaka, A. Aisen, and L. Brod-erick, “Local versus global features for content-based image retrieval,”in Proc. IEEE Workshop Content-Based Access Image Video Libraries,1998, pp. 30–34.

[19] L. H. Y. Tang, R. Hanka, and H. H. S. Ip, “A review of intelligent content-based indexing and browsing of medical images,”Health Inform. J., vol.5, no. 1, pp. 40–49, Mar. 1999.

[20] F. Schnorrenberg, C. S. Pattichis, K. C. Kyriacou, and C. N. Schizas,“Computer-aided detection of breast cancer nuclei,”IEEE Trans. In-form. Technol. Biomed., vol. 1, pp. 128–140, June 1997.

[21] P. W. Hamilton, P. H. Bartels, D. Thompson, N. H. Anderson, R. Mon-tironi, and J. M. Sloan, “Automated location of dysplastic fields in col-orectal histology using image texture analysis,”J. Pathology, vol. 182,pp. 68–75, 1997.

[22] (2002, July 31st). [Online]. Available: http://path.upmc.edu/cpi/cpi-res.html#TIS

[23] M. J. Varga and P. G. Ducksbury, “Application of content-based imagecompression to telepathology,” inProc. SPIE, vol. 4681, San Diego, CA,Feb. 23–28, 2002.

[24] P. R. Wheater, H. G. Burkitt, and V. G. Daniels,Functional His-tology. London, U.K.: Churchill Livingstone, 1993, sec. 14.

[25] L. H. Tang, “Semantic Analysis of Image Content for Intelligent Re-trieval and Automatic Annotation of Medical Images,” Ph.D. disserta-tion, Univ. Cambridge, U.K..

[26] K. K. T. Cheung, R. W. K. Lam, H. H. S. Ip, R. Hanka, L. H. Y. Tang, andG. Fuller, “An object-oriented framework for content-based image re-trieval based on 5-tier architecture,” inProc. Asia-Pacific Software Eng.Conf. 99, Takamatsu, Japan, Dec. 7–10, 1999, pp. 174–177.

[27] R. W. K. Lam, H. H. S. Ip, K. K. T. Cheung, L. H. Y. Tang, and R. Hanka,“A multi-window approach to classify histological features,” inProc.Int. Conf. Pattern Recognition, vol. 2, Barcelona, Spain, Sept. 2000, pp.259–262.

H. Lilian Tang received the B.Eng and M.Eng de-grees in computer science from Northeastern Univer-sity, China, in 1989 and 1992, respectively, and thePh.D. degree in medical informatics from the Uni-versity of Cambridge, Cambridge, U.K., in 2000.

She is currently a Lecturer in the Department ofComputing, University of Surrey, Surrey, U.K. Herresearch interests include intelligent multimedia in-formation retrieval, natural language processing, ma-chine translation, medical informatics, and content-based image retrieval.


Rudolf Hanka (M’70) was born in Prague, Czecho-slovakia, and received the M.Sc. degree in electricalengineering from the Czech Technocal University(CVUT), Prague, Czech Republic, in 1961, the M.A.degree from the University of Cambridge, U.K.,in 1972, and the Ph.D. degree in statistical patternrecognition from the Strathclyde University, U.K.,in 1976.

He is currently a Visiting Professor of KnowledgeManagement at the Faculty of Management, Univer-sity of Economics (VSE), Prague, Czech Republic,

and Head of the Medical Informatics Unit and Director of the Centre for ClinicalInformatics, University of Cambridge. He has been a Visiting Professor of theFirst Military Medical University, Guangzhou, China, since 1999. His researchinterests lie in applications of statistics and computing to medical applicationswith additional interest in knowledge management. Given this broad range ofinterests, he has published in a wide cross section of disciplines and applicationareas including black-box modeling, statistical pattern recognition, image pro-cessing, medical diagnosis, and knowledge management.

Dr Hanka is Member of the British Computer Society and the New YorkAcademy of Sciences and a Fellow of the Royal Statistical Society and the RoyalSociety of Medicine.

Horace H. S. Ip (M’91) received the B.Sc. degree(with first-class honors) in applied physics and thePh.D. degree in image processing from UniversityCollege, London, U.K., in 1980 and 1983, respec-tively.

Currently, he is the Chair Professor and Head of theComputer Science Department and the founding di-rector of the AIMtech Centre (Centre for InnovativeApplications of Internet and Multimedia Technolo-gies) at the City University of Hong Kong, Kowloon,Hong Kong. His research interests include image pro-

cessing and analysis, pattern recognition, hypermedia computing systems andcomputer graphics. He is a Member of the Editorial Boards of thePattern Recog-nition Journal, The Visual Computer, the International Journal of MultimediaTools and Applications, theChinese Journal of CAD and Computer Graphics,and a guest editor for the international journalReal-Time ImagingandReal-TimeSystems. He has published more than 120 papers in international journals andconference proceedings.

Dr. Ip is a Fellow of the Hong Kong Institution of Engineers (HKIE) and aFellow of the Institution of Electrical Engineers (IEE), United Kingdom. Heserves on the International Association for Pattern Recognition (IAPR) Gov-erning Board and served as founding Co-Chair of its Technical Committee onMultimedia Systems, he is currently the Vice-Chairman of the China ComputerFederation, Technical Committee on CAD and Computer Graphics. He wasthe Chairman of the IEEE (Hong Kong Section) Computer chapter, CouncilMember of the Hong Kong Computer Society,and the Founding President ofthe Hong Kong Society for Multimedia and Image Computing.

Date post:	22-Sep-2016
Category:	Documents
Upload:	hhs
View:	215 times
Download:	0 times

Histological image retrieval based on semantic content analysis

Documents