+ All Categories
Home > Documents > The Applicability of Neural Networks to Non-linear Image … faculteit/Decaan... · suffice. These...

The Applicability of Neural Networks to Non-linear Image … faculteit/Decaan... · suffice. These...

Date post: 23-Jun-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
18
Pattern Analysis & Applications (1999)2:111–128 1999 Springer-Verlag London Limited The Applicability of Neural Networks to Non-linear Image Processing D. de Ridder, R.P.W. Duin, P.W. Verbeek and L.J. van Vliet Pattern Recognition Group, Applied Physics Department, Delft University of Technology, Delft, The Netherlands Abstract: In this paper, the applicability of neural networks to non-linear image processing problems is studied. As an example, the Kuwahara filtering for edge-preserving smoothing was chosen. This filter is interesting due to its non-linear nature and natural modularity. A number of modular networks were constructed and trained, incorporating prior knowledge in various degrees and their performance was compared to standard feed-forward neural networks (MLPs). Based on results obtained in these experiments, it is shown that several key factors influence neural network behaviour in this kind of task. First, it is demonstrated that the mean squared error criterion used in neural network training is not representative for the problem. To be able to discern performance differences better, a new error measure for edge-preserving smoothing operations is proposed. Secondly, using this measure, it is shown that modular networks perform better than standard feed-forward networks. The latter type often ends up in linear approximations to the filter. Finally, inspection of the modular networks shows that, although analysis is difficult due to their non-linearity, one can draw some conclusions regarding the effect of design and training choices. The main conclusion is that neural networks can be applied to non-linear image processing problems, provided that careful attention is paid to network architecture, training set sampling and parameter choice. Only if prior knowledge is used in constructing the networks and sampling the datasets can one expect to obtain a well performing neural network filter. Keywords: Edge preserving smoothing; Image processing; Neural network architectures; Non-linear filtering; Quantitative performance measures 1. INTRODUCTION In image processing practice, one regularly encounters appli- cations for which classic image processing algorithms do not suffice. These applications, such as seismic image analysis, fingerprint analysis or medical imaging, often require an adaptive filtering solution, i.e. a data-dependent approach. At the same time, adaptive linear techniques are often insufficiently powerful to solve these problems. Since neural networks are known to be data-driven universal approxima- tors of non-linear functions [1], they may help us to find new ways of tackling these problems. This has been recog- nised before, e.g. in work by Spreeuwers [2] and Pugmire et al [3], who showed that neural networks can be of use in convolution-like operations such as image restoration and edge detection. Neural networks have been applied as mor- phological filters by Shih and Moh [4] and Yin et al [5]. Recently, Jahn [6] described a cellular neural network for a Received: 28 May 1998 Received in revised form: 22 September 1998 Accepted: 16 October 1998 method of edge-preserving smoothing related to anisotropic diffusion [7]. In classification tasks, neural networks have been used as image feature detectors in a convolution-like way by (amongst others) Fukushima [8] and Le Cun et al [9]. In the field of signal processing, neural networks have also been applied as non-linear filters (see Luo and Unbehauen [10] for an overview). Somewhat further removed from the work presented in this paper are the applications of neural networks to image processing problems in a more general sense: filter parameter tuning [11,12], target recognition [13] and segmentation [12], which are a form of classification, image description and compression [14] and the study of perception and vision [15], concerning finding the building blocks of images, and image restoration [16]. Although our interest lies in creating better filters using neural networks, the emphasis in the basic research reported on in this paper is not on an actual application. Our main goal is to see whether standard feed-forward neural networks can be applied successfully to a non-linear image filtering problem. If so, what are the prerequisites for obtaining a well-functioning network? Secondly, our goal is to see
Transcript
Page 1: The Applicability of Neural Networks to Non-linear Image … faculteit/Decaan... · suffice. These applications, such as seismic image analysis, fingerprint analysis or medical

Pattern Analysis & Applications (1999)2:111–128 1999 Springer-Verlag London Limited

The Applicability of Neural Networks toNon-linear Image Processing

D. de Ridder, R.P.W. Duin, P.W. Verbeek and L.J. van VlietPattern Recognition Group, Applied Physics Department, Delft University of Technology, Delft, The Netherlands

Abstract: In this paper, the applicability of neural networks to non-linear image processing problems is studied. As an example, theKuwahara filtering for edge-preserving smoothing was chosen. This filter is interesting due to its non-linear nature and natural modularity.A number of modular networks were constructed and trained, incorporating prior knowledge in various degrees and their performance wascompared to standard feed-forward neural networks (MLPs). Based on results obtained in these experiments, it is shown that several keyfactors influence neural network behaviour in this kind of task. First, it is demonstrated that the mean squared error criterion used inneural network training is not representative for the problem. To be able to discern performance differences better, a new error measurefor edge-preserving smoothing operations is proposed. Secondly, using this measure, it is shown that modular networks perform better thanstandard feed-forward networks. The latter type often ends up in linear approximations to the filter. Finally, inspection of the modularnetworks shows that, although analysis is difficult due to their non-linearity, one can draw some conclusions regarding the effect of designand training choices. The main conclusion is that neural networks can be applied to non-linear image processing problems, provided thatcareful attention is paid to network architecture, training set sampling and parameter choice. Only if prior knowledge is used in constructingthe networks and sampling the datasets can one expect to obtain a well performing neural network filter.

Keywords: Edge preserving smoothing; Image processing; Neural network architectures; Non-linear filtering; Quantitative performancemeasures

1. INTRODUCTION

In image processing practice, one regularly encounters appli-cations for which classic image processing algorithms do notsuffice. These applications, such as seismic image analysis,fingerprint analysis or medical imaging, often require anadaptive filtering solution, i.e. a data-dependent approach.At the same time, adaptive linear techniques are ofteninsufficiently powerful to solve these problems. Since neuralnetworks are known to be data-driven universal approxima-tors of non-linear functions [1], they may help us to findnew ways of tackling these problems. This has been recog-nised before, e.g. in work by Spreeuwers [2] and Pugmire etal [3], who showed that neural networks can be of use inconvolution-like operations such as image restoration andedge detection. Neural networks have been applied as mor-phological filters by Shih and Moh [4] and Yin et al [5].Recently, Jahn [6] described a cellular neural network for a

Received: 28 May 1998Received in revised form: 22 September 1998Accepted: 16 October 1998

method of edge-preserving smoothing related to anisotropicdiffusion [7].

In classification tasks, neural networks have been used asimage feature detectors in a convolution-like way by(amongst others) Fukushima [8] and Le Cun et al [9]. Inthe field of signal processing, neural networks have alsobeen applied as non-linear filters (see Luo and Unbehauen[10] for an overview). Somewhat further removed from thework presented in this paper are the applications of neuralnetworks to image processing problems in a more generalsense: filter parameter tuning [11,12], target recognition [13]and segmentation [12], which are a form of classification,image description and compression [14] and the study ofperception and vision [15], concerning finding the buildingblocks of images, and image restoration [16].

Although our interest lies in creating better filters usingneural networks, the emphasis in the basic research reportedon in this paper is not on an actual application. Our maingoal is to see whether standard feed-forward neural networkscan be applied successfully to a non-linear image filteringproblem. If so, what are the prerequisites for obtaining awell-functioning network? Secondly, our goal is to see

Page 2: The Applicability of Neural Networks to Non-linear Image … faculteit/Decaan... · suffice. These applications, such as seismic image analysis, fingerprint analysis or medical

112 D. de Ridder et al.

whether these networks correspond to classic approaches tosolve such a task: we study them to improve our understand-ing of image processing operations and perhaps in time tofind new approaches for difficult image processing tasks.

To investigate the possibilities of using feed-forward neuralnetworks (or multi-layer perceptrons, MLPs) and the prob-lems one might encounter, the research was concentratedon a single example of a non-linear filter: the Kuwaharafilter for edge-preserving smoothing [17]. Since this filter iswell-understood and the training goal is exactly known, itis possible to investigate to what extent neural networks arecapable of performing the same task. The Kuwahara filteralso is an excellent object for this study because of itsinherent modular structure, allowing us to split the problemin smaller parts. This is known to be an advantage inlearning [18], and enables us to study subproblems in iso-lation. Recently, Pugmire et al [3] looked at the applicationof neural networks to edge detection and found that structur-ing learning in this way can improve performance; however,they did not investigate the precise role this structuringplays.

One of the first issues to address when applying a neuralnetwork is the choice of architecture. The question is towhat extent the use of prior knowledge is required. A varietyof design choices is investigated, covering the range betweencompletely hand-designed, modular networks (following e.g.Shih and Moh [4] in their neural network implementationof morphological operations) and standard feed-forward net-works. With the resulting networks, a number of experimentsis performed. The results give rise to some interesting find-ings.

First, the experiments indicate that the problem of con-structing a training set is not a trivial one. Two ways ofdoing so will be compared. Secondly, one of the mostinteresting results of the experiments is that it seems thatwhatever method is used to train a neural network toperform a Kuwahara filtering, the resulting performance(measured by the mean squared error, or MSE) is more orless the same. However, visual inspection suggests otherwise.To be able to differentiate better between the various neuralnetworks, a new performance measure for edge-preservingsmoothing is introduced. Finally, we take a look inside the‘black boxes’ that neural networks are usually regarded tobe. The functionality of the modules used in the modularnetworks is investigated after training. Although the resultsare hard to analyse, some general conclusions regarding theinfluence of training parameters can be drawn. It is alsoshown that some of the standard feed-forward networks,which have a poor performance with respect to our perform-ance measure, have learned a linear approximation to theKuwahara filter.

The outline of this paper is as follows: the Kuwaharafilter will be discussed in Section 2 and a description of thevarious neural network architectures used will be given inSection 3. In Section 4, the experiments performed usingthese networks will be described, followed by a discussionof the questions raised by the results obtained in theseexperiments. The next few sections are devoted to answering

these questions: in Section 5, the influence of data setsampling is investigated; in Section 6 the appropriatenessof the MSE as error measure is discussed and a new perform-ance measure is presented. Sections 7 and 8 deal withinspection of the modular and standard feed-forward net-works, respectively. Finally, in Section 9 the conclusions andsome ideas regarding open research questions are presented.

2. KUWAHARA FILTERING

The Kuwahara filter is used to smooth an image whilepreserving the edges [17,19,20]. Figure 1(a) illustrates theoperation of the filter. The input of the filter is a(2k 2 1) 3 (2k 2 1) pixel neighbourhood around the centralpixel. This neighbourhood is divided into four overlappingsubwindows Wi, i = 1, 2, 3, 4, each of size k 3 k pixels. Foreach of these subwindows, the mean mi and the variances2

i of the k2 grey value is calculated. The output of thefilter is then found as the mean mm of the subwindow Wm

having the smallest grey value variance (m = arg mini s2i).

This operation can be applied in a convolution-like mannerto filter an entire image. For an example of the effect ofthe filter see Fig. 7.

The filter is non-linear and the selection of the subwindowbased on the variances is data-driven. Edges are not blurredas in normal uniform filtering. As an edge will always liein at most three subwindows, there will always be at leastone subwindow that does not contain an edge and thereforehas low variance. For neighbouring pixels in edge regions,different subwindows will be selected (due to the minimumoperation), resulting in sudden large differences in greyvalue.

This filter was used since:

I It is modular (Fig. 1(b) illustrates this). This gives us theopportunity to break down its operation into subtaskswhich can perhaps be more easily learned than the wholetask at once. It will be interesting to see whether a neuralnetwork will need this modularity and complexity in orderto approximate the filter’s operation. Also, it offers theopportunity to study a network’s operation in terms ofthe individual modules.

I It is non-linear. If neural networks can be put to use inimage processing, the most rewarding application wouldbe one to non-linear image processing. Neural networksexcel in learning (seemingly) highly complex, non-lineartasks with many parameters using only a relatively smallnumber of samples.

3. NEURAL NETWORKARCHITECTURES

Although there is an abundance of network architectures,it was decided to use the most widely used type of neuralnetwork: a feed-forward network or MLP [21,22]. This typeof network is well-studied; the learning algorithms and pit-falls are well known and it has been shown that feed-

Page 3: The Applicability of Neural Networks to Non-linear Image … faculteit/Decaan... · suffice. These applications, such as seismic image analysis, fingerprint analysis or medical

113The Applicability of Neural Networks to Non-linear Image Processing

Fig. 1. (a) The Kuwahara filter: k 3 k subwindows in a(2k 2 1) 3 (2k 2 1) window; here k = 3; (b) Kuwahara filter oper-ation as a sequence of operations.

forward networks are universal approximators [1], makingthem applicable to our problems. However, we know verylittle about the actual operation of the network, that is, theinner workings of the ‘black box’ many consider it to be(for a rather polemic discussion on this topic, see theexcellent paper by Green [23]).

When studying neural network properties, such as internaloperation (which functions are performed by which hiddenunits) or generalisation capabilities, one often encounters aphenomenon which could be described as a neural networkuncertainty principle. That is, there is a trade-off, controlledby restricting the architecture of a network, between thepossibility of understanding how a trained network operatesand the degree to which the experiment is still true-to-life.If an unrestricted neural network is trained on a real-lifedataset, the setup most closely resembles the application ofneural networks in everyday practice. However, the subtletiesof the dataset and the many degrees of freedom in thenetwork prevent us from gaining a deeper insight into theoperation of the network. On the other side, once a networkis restrained, e.g. by sharing or removing weights, loweringthe number of degrees of freedom or constructing architec-tures only specifically applicable to the problem at hand(see e.g. Shih and Moh [4]), the situation is no longer atypical one. The network may even become too constrainedto learn the task at hand. The same holds for editing adataset to influence its statistics or to enhance more prefer-able features with regard to network training [24,25]. Notethat this is not precisely the same issue as addressed by thebias-variance trade-off [26], which is concerned with thecomplexity of a model. Our concern is with the specificity

of the model which, in principle, is unrelated to complexity:making a model more specific need not introduce a bias.

To cover this spectrum of possibilities, a number of modu-lar neural networks with varying degrees of freedom wasconstructed. The layout of such a modular network is shownin Fig. 2. Of the modular networks, four types were created.These are discussed below in descending order of artificiality,i.e. the first is completely hand-designed, with every weightset to an optimal value, while the last consists of onlystandard feed-forward modules.

3.1. Modular Networks

The four modular neural network variations are:

Type I. For this type, the modules are hand-designed forthe tasks they are to perform. In some cases, this meansusing other than standard (i.e. sigmoid, linear) transferfunctions and very unusual weight settings. Figures 3–6show the four module designs and the weights assignedto their connections:

The mean module (Fig. 3) uses only linear transferfunctions in units averaging the inputs. Four of thesemodules are used to calculate m1, . . ., m4.The variance module (Fig. 4) uses a submodule (onthe left) to calculate the mean of each of the foursubwindows it is presented. The other submodule (onthe right) just transports the original data to lowerlayers. The calculated means are then subtracted fromthe original inputs, followed by a layer of units usinga f(x) = tanh(x2) transfer function to approximate thesquare of the input1. Four of these modules are usedto find s2

1, . . ., s24.

The position-of-minimum module for selecting the pos-ition of the minimum of four inputs (Fig. 5) is themost complicated one. Using a

f(x) = 5ln S 11 1 exp (2x)D x . 1023

21010 x # 1023

(1)

transfer function, i.e. the logarithm of a sigmoid, unitsin the first three hidden layers act as switches compar-ing their two inputs. Alongside these switches, lineartransfer function units are used to transport the originalvalues to deeper layers. Weights WA and WB are veryhigh to enable the units to act as switches. If theinput connected using weight WA (say input a) isgreater than the input connected using weight WB

(input b), the sum will be large and negative, theoutput of the sigmoid will approach 0.0 and the outputof the unit will be 2`. If b . a on the other hand,the sum will be large and positive, the output of thesigmoid part will approach 1.0 and the final output of

11This function is chosen, since it approximates x2 well on the interval itwill be applied to, but is bounded: it asymptotically reaches 1 as the inputgrows to 6`. The latter property is of importance for training the network.

Page 4: The Applicability of Neural Networks to Non-linear Image … faculteit/Decaan... · suffice. These applications, such as seismic image analysis, fingerprint analysis or medical

114 D. de Ridder et al.

Fig. 2. A modular neural network. The top layer is the input layer.

the unit will be 0.0. This output can be used as aninhibiting signal, by passing it to units of the sametype in lower layers. In this way, units in the thirdhidden layer have as output – if inputs are denoted ass1, s2, s3, and s4:

si = 50.0 si , mink=1,. . .,4`k±i sk

0.5 otherwise(2)

WA and WB are slightly different to handle cases inwhich two inputs are exactly the same but one (inthis case arbitrary) minimum position has to be found.The fourth and fifth hidden layer ensure that exactlyone output unit will indicate that the correspondinginput was minimal, by setting the output of a unit to0.0 if another unit to the right has an output ± 0.0.Finally, biases (indicated by values next to the units)

Fig. 3. The module for calculating the mean. The top layer is theinput layer.

Fig. 4. The module for calculating the variance. The top layer isthe input layer.

are used to let the outputs have the right value (0.0or 0.5).The selection module (Fig. 6) uses large weightscoupled to the position-of-minimum module outputs(inputs s1, s2, s3 and s4) to block out the unwantedmean values mi before adding these. The small weightswith which the mean values are multiplied and thelarge incoming weight of the output unit are used toavoid the non-linearity of the transfer function.Since all the weights were fixed, these networks werenot trained.

Type II. The modules had the same architectures asthose of Type I. However, in this case the weights werenot fixed, hence the modules could be trained. Thesemodules were expected to perform poorly, as some of the‘optimal weights’ (as set in Type I) were very high.Type III. In these modules, non-standard transfer func-tions were no longer used. As a result, the modules whichwere to be trained to calculate the variance and theposition of the minimum had to be replaced by standardnetworks. Networks used contained two layers of 25 hid-den units each, each of which had a f(x) =2/(exp(2x)) 2 1 transfer function. This number of hiddenunits ensures that the networks have a large number offree parameters, but keeps training times feasible.Type IV. In this final type, all modules consisted ofnormal networks with two hidden layers of 25 units each.

With these four types, a move is made from a fixed, hard-wired type of network (type I), which is in many ways likea hardware implementation of the Kuwahara filter, to a freetype (type IV) in which only the prior knowledge that the

Page 5: The Applicability of Neural Networks to Non-linear Image … faculteit/Decaan... · suffice. These applications, such as seismic image analysis, fingerprint analysis or medical

115The Applicability of Neural Networks to Non-linear Image Processing

Fig. 5. The module for finding the position of the minimum. Thetop layer is the input layer.

filter consists of four subtasks is used. Our hope is to see agradual change in behaviour and performance.

3.2. One-shot Networks

To validate results obtained with the networks described inthe previous section, experiments were also performed withstandard, fully connected networks having one or two hiddenlayers of 1, 2, 3, 4, 5, 10, 25, 50, 100 or 250 units each.These latter networks will be referred to as one-shot networks,as opposed to modular networks.

4. EXPERIMENTS

4.1. Data Sets

To train the neural networks, a dataset was constructed bytaking random samples from image A (input) and its Kuwah-ara filtered version (output), both shown in Fig. 7(a). Theoriginal 8-bit 256 grey value image was converted to afloating point image, with grey values in the range [20.5,0.5]. Three datasets were constructed, containing 1000samples each: a training set, a validation set and a test set.The validation set was used to stop training: if the erroron the validation set did not drop below the minimumerror found so far on that set for 1000 cycles, trainingwas stopped. This prevents overtraining [21]. Since in allexperiments only k = 3 Kuwahara filters were studied, theinput to each network was a 5 3 5 region of grey values andthe training target was 1 value. For the modular networks,additional datasets were constructed from these original data-sets to obtain the mappings required by the individualnetworks (mean, variance, position-of-minimum andselection).

4.2. Training

For training, the standard stochastic backpropagation algo-rithm [27] was used. Weights were initialised to randomvalues drawn from a uniform distribution in the range [20.1,0.1]. The learning rate was set to 0.1; no momentum wasused. Training was stopped after 25,000 cycles or if thevalidation set indicated overtraining, whichever came first.All experiments were repeated five times with differentrandom initialisations; all results reported are mean resultsof five experiments. Where ever appropriate, error barsindicate standard deviation.

4.3. Results

Training and testing results are given in Figs 8 and 9. Theseresults will be discussed for the different architectures. For

Fig. 6. The module for selecting the right mean. The top layer isthe input layer.

Page 6: The Applicability of Neural Networks to Non-linear Image … faculteit/Decaan... · suffice. These applications, such as seismic image analysis, fingerprint analysis or medical

116 D. de Ridder et al.

these first experiments, only the results obtained on thenormal training and test sets (i.e. the darker bars) are ofimportance. The other results, using a differently sampledtraining set, will be discussed in Section 5.

4.3.1. Modules. The different modules show rather differ-ent behaviour (Fig. 8). Note that in these figures the MSEwas calculated on a test set of 1000 samples. As was to beexpected, the MSE is lowest for the hand-constructed TypeI modules: for all tasks except position-of-minimum it was0. The error remaining for the position-of-minimum modulemay look quite high, but is caused mainly by the networkchoosing a wrong minimum when two or more input valuessi are very similar. Although the effect on the behaviourof the final module will be negligible, the MSE is quitehigh since one target value which should have been 0.5 isincorrectly set to 0.0, and vice versa, leading to an MSE of0.25 for that input pattern. For the other types, it seemsthat if the manually set weights are dropped (type II), themodules are not able to learn their function as well aspossible (i.e. as well as type I). Nonetheless, the MSE isquite good and comparable to types III and IV.

When the individual tasks are considered, the mean isobviously the easiest function to approximate. Only for typeIV, in which a standard module with two hidden layers wasused, is the MSE unequal to 0. The variance too is notdifficult: MSEs are 2(1025). Clearly, the position-of-mini-mum task is the hardest. Here, almost all types performpoorly. Performances on the selection problem, finally, arequite well. What is interesting is that the more constrainedmodules (types II, III) perform less well than the standardones. Here again the effect that the construction is closelyconnected to the optimal set of weights plays a role.

4.3.2. Modular Networks. When the modules are concat-enated, the initial MSE of the resulting network is extremelypoor: for type II, III and IV networks 2(1), 2(1021) and2(1022), respectively. The position-of-minimum module ismainly responsible for this; it is the hardest module to learndue to the non-linearity involved. If the trained position-of-minimum module is replaced by the constructed type Imodule, the overall MSE always decreases significantly (seeTable 1). This is an indication that, although the MSE israther low (2(1023)), this module does not perform well.Furthermore, it seems that the overall MSE is highly sensi-tive to the error this module makes.

However, when the networks are trained a little furtheras a whole with a low learning rate (0.1), the MSE improvesrapidly: after only 100–500 learning cycles training can bestopped. In Pugmire et al [3], the same effect occurs. TheMSEs of the final networks are shown in Fig. 9 (a), (e)and (i) for images A, B and C, respectively, as the MSEon a total image. Images B and C were pre-processed inthe same way as image A: the original 8-bit (B) and 5-bit(C) 256 grey value images were converted to floating pointimages, with grey values in the range [20.5, 0.5].

To get an idea of the significance of these results, re-initialised versions of the same networks were also trained.That is, the weights of the concatenated network were

Fig. 7. Images used for training (a) and testing (b,c) purposes. Theleft images are the originals; these images will be referred to as A,B and C, respectively. The right images are the Kuwahara filteredversions (for image A, the training target). The original 8-bit ((a)and (b)) and 5-bit (c) 256 grey value images were converted tofloating point images, with grey values in the range [20.5, 0.5].

initialised randomly without using the prior knowledge ofmodularity. The results of these training runs are shown inFig. 9 (b), (f) and (j). Note that only for type II thenetworks cannot be trained well from scratch, due to thenon-standard transfer functions used. For type III and IVnetworks, the MSE is comparable to the other networks.This would indicate that modular training is not beneficial,at least according to the MSE criterion.

Page 7: The Applicability of Neural Networks to Non-linear Image … faculteit/Decaan... · suffice. These applications, such as seismic image analysis, fingerprint analysis or medical

117The Applicability of Neural Networks to Non-linear Image Processing

Fig. 8. Performance of the individual modules on both the normal and edge-favouring test sets: (a)–(b) the mean module; (c)–(d) thevariance module; (e)–(f) the position-of-minimum module; and (g)–(h) the selection module.

4.3.3. One-shot Networks. Results for the one-shot net-works are shown in Fig. 9 (c)–(d), (g)–(h) and (k)–(l) forimages A, B and C. In each case, the first figure gives theresults for networks with one hidden layer; the second figurefor networks with two hidden layers. What is most strikingis that for almost all sizes of the networks the MSE is moreor less the same.

4.4. Discussion

The most noticeable result of these experiments is thatwhatever network is trained, be it a simple one hiddenunit network or an especially constructed modular network,approximately the same performance (measured in MSE)can be reached. Modular training does not seem to boostperformance at all. The cause for this may lie in a numberof problems:

I The problem may simply be too hard to be learned by afinite-size neural network. This does not seem plausible,since even for a two-hidden layer network with 250hidden units per layer, resulting in a total of 69,000 freeparameters, the MSE is no better than for very simplenetworks. One would at least expect to see some enhance-ment of results.

I It is very well possible that the, rather arbitrarily chosen,sample size of 1000 is too small. An experiment was set

up in which a one hidden layer, 50 hidden unit networkwas trained using training sets with 50, 100, 250, 500,1000 and 2000 samples. The results, given in Fig. 10,show however that the chosen sample size of 1000 isquite sufficient: using 2000 samples in the training setdoes not decrease the MSE much.

I The dataset may not be representative for the problem,i.e. the nature of the problem may not be well reflectedin the way the set is sampled from the image. Anexperiment to test this hypothesis is discussed in the nextsection (Section 5).

I The error criterion may not be fit for training the net-works. It is very well possible that the MSE criterionused is of limited use in this problem, since it weighsboth the interesting parts of the image, around the edges,and the less interesting parts equally. This option isexplored in Section 6.

I The problem may be of such a nature that local minimaare prominently present in the error surface, while theglobal minima are very hard to reach, causing suboptimalnetwork operation. This hypothesis is tested in Section 8.

5. EDGE-FAVOURING SAMPLING

A problem in sampling an image for this particular appli-cation is that the interesting regions, i.e. the regions where

Page 8: The Applicability of Neural Networks to Non-linear Image … faculteit/Decaan... · suffice. These applications, such as seismic image analysis, fingerprint analysis or medical

118 D. de Ridder et al.

Fig. 9. Performance of the modular and one-shot networks on the three images used. (a)–(d) on image 7(a); (e)–(h) on image 7(b); and(i)–(l) on image 7(c).

the filter is non-linear, are very poorly represented. Asexplained in Section 2, the filter is highly non-linear – andtherefore hardest to learn – around the edges in an image.Unfortunately, edge pixels constitute only a very small per-centage of the total number of pixels in an image (as arule of thumb, 2(√n) edge pixels on 2(n) image pixels).

To learn more about the influence the sampling of thetraining set has on the performance, a second set of datasets was created by sampling from Fig. 7(a) with a probabilitydensity function given by its gradient magnitude image. Thegradient magnitude u=Iu of an image I is calculated as

u=Iu = !SdIdxD

2

1 SdIdyD

2

(3)

where dI/dx is approximated by convolving with a [21 0 1]mask, and dI/dy by convolving with its transpose. If u=Iu isscaled such that ex ey c · u=I(x,y)udydx = 1, and used as aprobability density function when sampling, edge regionshave a much higher probability of being included in thedata set than pixels from flat regions. This will be callededge-favouring sampling, as opposed to normal sampling.

Note that constructing a dataset in this way is equivalent

Page 9: The Applicability of Neural Networks to Non-linear Image … faculteit/Decaan... · suffice. These applications, such as seismic image analysis, fingerprint analysis or medical

119The Applicability of Neural Networks to Non-linear Image Processing

Table 1. Dependence of performance, in MSE on image A, on the position-of-minimum module. Values given are meanMSEs and standard deviations. ‘nrm.’ and ‘e.f.’ stand for a normal or edge-favouring training set

Type Train set MSE MSE with Type Ipos. of min. module

II nrm. 9.2 3 1021 6 5.2 3 1021 8.7 3 1024 6 1.7 3 1024

e.f. 1.1 6 1.2 3 1022 1.2 3 1023 6 5.5 3 1025

III nrm. 1.2 3 1021 6 1.2 3 1021 1.0 3 1023 6 2.0 3 1024

e.f. 2.0 3 1021 6 2.3 3 1021 1.0 3 1023 6 2.4 3 1024

IV nrm. 3.6 3 1022 6 1.7 3 1022 1.2 3 1023 6 2.4 3 1024

e.f. 1.0 3 1022 6 7.0 3 1023 1.2 3 1023 6 2.4 3 1023

Fig. 10. Performance of a one hidden layer network with 50 hiddenunits for various training set sample sizes.

to using a much larger dataset and weighing the MSEwith the gradient magnitude. Therefore, this approach iscomparable to using an adapted MSE criterion in trainingthe neural network.

5.1. Results

Performances in MSE of networks trained on this edge-favouring set and a comparison to MSEs of networks trainedon the normal set are given in Figs 8 and 9. The samplingof the dataset clearly has an influence on the results. Sincethe edge-favouring set contains more samples taken fromregions around edges, the task of finding the mean is harderto learn due to the larger variation. At the same time,this eases training the position-of-minimum and selectionmodules. For all tasks except the mean, the final MSEaround the edges (Fig. 8 (b), (d), (f) and (h)) is betterthan that of networks trained using a normal training set.The MSE is, in most cases, even lower on the normaltest set.

Overall results for the concatenated and one-shot net-works (Fig. 9) suggest that performance decreases when net-works are trained on a specially selected data set (i.e. theMSE increases). However, when the quality of the filteringoperation is judged by looking at the filtered images (seee.g. Fig. 11), one finds that these networks give superiorresults in approximating the Kuwahara filter: there is adiscrepancy between performance as indicated by the MSEand perception of filter quality. Therefore, the possibilityof finding another way of measuring performance for thisapplication is investigated in the next section.

6. A PERFORMANCE MEASURE FOREDGE-PRESERVING SMOOTHING

The results given in Section 4.3 show that it is very hardto interpret the MSE as a measure of filter performance.Although the performances found differ, they do so onlyslightly. However, visually the differences are quite large. Ifimages filtered by various networks are compared, it isimmediately clear which network performs better. As anexample, Fig. 11 shows two filtered images. The left imagewas filtered by a modular neural network (type IV) trainedon an edge-favouring training set. The image on the rightis the output of a one layer, 100 hidden unit one-shotnetwork trained on a normal dataset. Although the MSEsare nearly equal (1.48 3 1023 for the left image versus1.44 3 1023 for the right one), in the left image the edgesseem much crisper and the regions much smoother than inthe image on the right; that is, one would judge the filterused to produce the left image to perform better.

One would like to find a measure for filter performancewhich should bear more relation to this qualitative judge-ment than the MSE. The reason why the MSE is souninformative is that by far the largest number of pixels donot lie on edges. Figure 12(a) illustrates this: it showsthat in the histogram of the gradient magnitude image isconcentrated near zero, i.e. most pixels lie in flat regions.Since the MSE averages over all pixels, it may be quite lowwhile edges are poorly preserved.

The finding that the MSE does not correlate well withperceptual quality judgement is not a new one. A numberof alternatives has been proposed, among which the most

Page 10: The Applicability of Neural Networks to Non-linear Image … faculteit/Decaan... · suffice. These applications, such as seismic image analysis, fingerprint analysis or medical

120 D. de Ridder et al.

Fig. 11. Two network output images with details. For the left image, output of a type IV modular network trained on an edge-favouring set,the MSE was 1.48 3 1023; for the right image, output of a one hidden layer, 100 hidden unit one-shot network trained on a normallysampled set, it was 1.44 3 1023. The details in the middle show the target output of the Kuwahara filter; the entire target image is shownin Fig. 7(a).

Fig. 12. (a) Histograms of gradient magnitudes of the original image(Fig. 7(a)) and a Kuwahara filtered version (k = 3); (b) scattergramof the gradient magnitude image pixel values with estimated lines.

prominent seems to be the Mean Absolute Error (MAE).There is also a body of work on performance measures foredge detection, e.g. Pratt’s Figure of Merit (FOM) [28] orAverage Risk [2]. However, none of these help us capturethe dual goals of edge sharpening and region smoothingpresent in our problem.

6.1. Smoothing Versus Sharpening

In edge-preserving smoothing, two goals are pursued: on theone hand, the algorithm should preserve edge sharpness; onthe other hand, it should smooth the image in regions thatdo not contain edges. In other words, the gradient of animage should remain the same in places where it is high2

and decrease where it is low.

22Or even grow higher. If the regions divided by the edge become smoother,the gradient of the edge itself may increase, as long as there was no overshootin the original image. Overshoot is defined as the effect of artificiallysharpening edges by adding a small value to the top part of an edge andsubtracting a small value from the lower part.

If the gradient magnitude u=Iu of an image I is plottedversus u=f(I)u of a Kuwahara-filtered version f(I), for eachpixel, the result will look like Fig. 12(b). In this figure, thetwo separate effects can be seen: for a number of points,the gradient is increased by filtering while for another setof points the gradient is decreased. The steeper the uppercloud, the better the sharpening; the less steep the lowercloud, the better the smoothing. Note that the figure givesno indication of the density of both clouds: in general, byfar the most points lie in the lower cloud, since more pixelslie in smooth regions than on edges. The graph is remi-niscent of the scattergram approach discussed (anddenounced) in Katsulai and Arimizu [29], but here thescattergram of the gradient magnitude images is shown.

To estimate the steepness of the clouds, this point datais first separated into two sets:

! = {(u=Iu(i,j),u=f(I)u(i,j)) |u=Iu(i,j) $ u=f(I)u (i,j)} (4)

@ = {(u=Iu(i,j),u=f(I)u(i,j)) | (5)

u=Iu(i,j) ,u=f(I)u(i,j)}

Lines y = ax 1 b can be fitted through both sets using arobust estimation technique, minimising the absolute devi-ation [30], to get a density-independent estimate of the factorswith which edges are sharpened and flat regions are smooth-ed:

(a!, b!) = arg min(a,b)

O(x,y)P!

uy 2 (ax 1 b)u (6)

(a@, b@) = arg min(a,b)

O(x,y)P@

uy 2 (ax 1 b)u (7)

The slope of the lower line found, a!, will give an indicationof the smoothing induced by the filter f. Likewise, a@ gives

Page 11: The Applicability of Neural Networks to Non-linear Image … faculteit/Decaan... · suffice. These applications, such as seismic image analysis, fingerprint analysis or medical

121The Applicability of Neural Networks to Non-linear Image Processing

an indication of the sharpening effect of the filter. Theoffsets b! and b@ are discarded, although it is necessary toestimate them to avoid biasing the estimate of a! and [email protected] that a demand is that a! # 1 and a@ $ 1, so thevalues are clipped at 1 if necessary.

To account for the number of pixels actually used toestimate these values, the slopes found are weighed withthe relative number of points used for the estimate. There-fore, the numbers Smoothing (f,I) = u!u/(u!u1u@u) (a9! 2 1)and Sharpening (f, I) = u@u/(u!u1u@u) (a@ 2 1) are used,where a9! = 1/a! was substituted to obtain numbers in thesame range [0, `→l. These two values can be consideredto be an amplification factor of edges and an attenuationfactor of flat regions, respectively.

These measures cannot be used as absolute quantitativeindications of filter performance, since a higher value doesnot necessarily mean a better performance, i.e. there is noabsolute optimal value. Furthermore, the measures are highlydependent on image content and scaling of f(I) w.r.t. I.The scaling problem can be neglected however, since thenetworks were trained to give output values in the correctrange. For various filters f(I) on a certain image, thesemeasures can now be compared, giving an indication ofrelative filter performance on that image. To get an idea ofthe range of possible values, smoothing and sharpeningvalues for some standard filters can be calculated, like aGaussian filter

fG(I, s) = I ⊗1

√2psexp S2

x2 1 y2

2s2 D (8)

for s = 0.0, 0.1, . . ., 2.0; an unsharp masking filter,

fU(I,k) = I − k × 1I ⊗ 1−1 2 −1

2 −4 2

−1 2 −122 (9)

which subtracts k times the Laplacian from an image,(k = 0.0, 0.1, . . ., 2.0) and the Kuwahara filter itself.

6.2. Results

Smoothing and sharpening performance values were calcu-lated for all the networks discussed in Section 4. The resultsare shown in Fig. 13. Firstly, the calibration lines give anindication of the range of possible values. As expected, theGaussian filter on Fig. 7(a) and (b) gives high smoothingvalues and low sharpening values, while the unsharp maskingfilter gives low smoothing values and high sharpening values.The Kuwahara filter scores high on smoothing and low onsharpening. This is exactly as it should be: the Kuwaharafilter should smooth while preserving the edges, it shouldnot necessarily sharpen them. If networks have a highersharpening value, they are usually producing overshootaround the edges in the output images.

The measures calculated for Fig. 7(c) show the limitationsof the method. In this image, there is a large number ofvery sharp edges in an otherwise already rather smoothimage. For this image, the Gaussian filter gives only very

low smoothing values and the unsharp masking filter givesno sharpening value at all. This is due to the fact that forthis image, subtracting the Laplacian from an image producesa very small sharpening value, together with a negativesmoothing value, caused by the Laplacian greatly enhancingthe amount of noise in the image. Since the values wereclipped at 0, the results do not show in the figure.

Regarding the networks, some things now become clear:

I first, the hand-constructed networks (type I) almost per-fectly mimic the Kuwahara filter, according to the newmeasures. However, as soon as the hand-set weights aredropped (type II), performance drops drastically. Appar-ently the non-standard transfer functions and specialarchitectures inhibit the networks too much. Type IIInetworks perform better and generalise well to otherimages. Only for type IV, in which only the knowledgethat the algorithm is modular is used, the results are goodon all images;

I no other network in this study seems to be able toapproximate the Kuwahara filter well. The best trainednetwork still performs much worse;

I most one-shot networks perform poorly. Only for networkswith two hidden layers and 10, 25 and 50 hidden unitsper layer, performance is reasonable. In retrospect, thisconcurs with the drop in the MSE that can be seen inFig. 9(d), although the differences are very small;

I there is an optimal number of hidden units for the one-shot networks; in this case, 50. A hypothesis is, that thisdepends on the learning conditions, since these parametersare not optimised for each network. To verify this, thesame set of one-shot networks was trained in experimentsin which the weights were initialised using random valuesdrawn from a [21.0, 1.0] distribution, using a learningrate of 0.5. Now, the optimal number of hidden unitswas found to be 25, with all other networks performingvery poorly;

I edge-favouring sampling has a very strong influence. Mostof the architectures discussed only perform reasonablywhen trained on such a set, especially the one-shot net-works;

I generalisation is, for all networks, reasonable. Even onimage C, which differs substantially from the trainingimage A, performance is quite good. The best one-shotnetwork (two hidden layers, 50 units per hidden layer)seems to generalise a little better than the modular net-works.

7. INSPECTION OF MODULARNETWORKS

It would be interesting to see whether the modular networksstill use their initialisation. That is, are the modules stillperforming the functions they were initially trained on, orhas the network – after being trained further for a while –found a better solution? To inspect the networks, the concat-enated networks were evaluated on the initial dataset and

Page 12: The Applicability of Neural Networks to Non-linear Image … faculteit/Decaan... · suffice. These applications, such as seismic image analysis, fingerprint analysis or medical

122 D. de Ridder et al.

Fig. 13. Performance of the modular and one-shot networks on the three images used. (a)–(d) On image 7(a); (e)–(h) on image 7(b); and(i)–(l) on image 7(c). In each row, the first figure compares the results of the Kuwahara, Gaussian and unsharp masking filters to networkperformance. The second figure depicts results of the modular networks; the third and fourth figures show results for one-shot networks withone hidden layer and two hidden layers, respectively. In the legends, e.f. stands for networks trained on edge-favouring data sets, as opposedto normally sampled data sets (normal); further indicates networks initialised by training the individual modules as opposed to networks trainedfrom scratch (re-init); and ‘10’, ‘25’ and so on denote the number of units per hidden layer. For reasons of presentation, the results for theone-shot networks with 1 through 5 units per hidden layer are not shown here; these were never better than those of networks with 10 units.

the outputs of the individual modules were recorded. Figures14 and 15 show some examples of such plots.

Unfortunately, detailed inspection is very hard. Ideally, ifeach module was performing the function it was trained toperform exactly, each plot would show a straight line y = x.The plots show that this is, in most cases, not the case.However, it is possible make some general remarks aboutthe differences between the various ways of training the

networks. These differences are most clear for the modulesfor the mean and selection modules:

I For well-performing networks, the mapping in each mod-ule is no longer evident. Instead, it seems these modulesmake rather good use of their non-linearity (Fig. 14(c)).The poorly performing networks still show a reasonablylinear behaviour (Fig. 14(f)).

Page 13: The Applicability of Neural Networks to Non-linear Image … faculteit/Decaan... · suffice. These applications, such as seismic image analysis, fingerprint analysis or medical

123The Applicability of Neural Networks to Non-linear Image Processing

Fig. 14. Progressively more freedom: the mean module trained using the normal training set, for type II (a), III (b) and IV (c) and trainedusing the edge-favouring training set, for type IV, using the modular initialisation (d) and trained from scratch (e); the selection moduletrained on the normal training set, for type II (f), III (g) and IV (h) and trained on the edge-favouring set, for type II (i), III (j) and IV(k). The different markers indicate the different output units.

Page 14: The Applicability of Neural Networks to Non-linear Image … faculteit/Decaan... · suffice. These applications, such as seismic image analysis, fingerprint analysis or medical

124 D. de Ridder et al.

Fig. 15. Unused or badly used modules: the variance module trained on the normal dataset, for type II (a) and IV (b), and trained on theedge-favouring set for type IV (c); the position-of-minimum module trained further on the normal dataset for type IV (d) and trained fromscratch on the same set (e). The different markers indicate the different output units. Note: in the latter two figures, the only desired outputis either 0.0 or 0.5; a small offset has been added for the different units for presentation purposes.

I There is a progressive increase in non-linearity for typesII, III and IV (Fig. 14(a)–(c), (f)–(h) and (i)–(k)). Theadded complexity allows the modules more flexibilitywhen they are trained further. Note however that thebasic mapping is still preserved, i.e. the trend is stillvisible for all units.

I There is an increase in non-linearity when networks aretrained on the edge-favouring set instead of the normalset (Fig. 14(f)–(h) vs. (i)–(k)).

I the networks trained from scratch generally do not findthe modular structure (Fig. 14(d)–(e)).

For the variance and position-of-minimum modules, thedifferences are less clear. Most of these modules seem tohave no function left in the final networks: the outputs areclamped at a certain value or vary a little in a small regionaround a value. For the variance module, only type IVmodules have enough flexibility. Here, too, training on theedge-favouring set increases the non-linearity of the output(Fig. 15(a)–(c)). The module for finding the position of theminimum, finally, is clamped in almost all architectures.Only type IV modules give some variation in output (Fig.15(d)–(e)). Networks trained from scratch are alwaysclamped, too.

In conclusion, it seems that in most networks, the moduleson the right side (variance and position-of-minimum) are

disabled. The high difficulty of the position-of-minimumtask is probably responsible for this phenomenon. However,the networks that do show some activity in these modulesare the networks that perform best, indicating that themodular initialisation is useful.

8. ONE-SHOT NETWORKS

To gain insight into the relatively poor performance of mostof the one-shot networks according to the performancemeasure introduced in Section 6, a very simple architecturewas created, containing only a small number of weights (seeFig. 16). Since the Kuwahara filter should be isotropic, asymmetry can be imposed on the weights. Weights whichshould be the same, indicated in Fig. 16 by the same letter,were set the same using the technique of weight sharing[9,24]. Furthermore, linear transfer functions were used toavoid the complications introduced in the analysis by theuse of sigmoids. No bias was used.

This network was trained on the normal dataset, using avalidation set. The learned weight matrix is shown in Fig.17(a). In filtering terms, the main component looks like anegative second derivative of a Gaussian (i.e. the negativevalues around the centre and the slightly positive values inthe four corners). By fitting various models it became clear

Page 15: The Applicability of Neural Networks to Non-linear Image … faculteit/Decaan... · suffice. These applications, such as seismic image analysis, fingerprint analysis or medical

125The Applicability of Neural Networks to Non-linear Image Processing

Fig. 16. The most simple neural network to perform a Kuwaharafiltering. It has a 5 3 5 neuron input layer and a one output unitwithout bias. The network contains six independent weights indi-cated in the weight matrix by the letters A through F.

Fig. 17. (a) A weight matrix found by training the network shownin Fig. 16; (b) the weight matrix generated by the fitted model(c1 = 1.41, s1 = 2.77, c2 = 1.38, s2 = 0.99); (c) a cross section of themodel at x = 0.

that this filter most closely resembles a mixture of a normalGaussian and the second derivative of a Gaussian.

This can well be explained by looking at the trainingobjective. The Kuwahara filter smoothes images while pre-serving the edges. The Gaussian is a smoothing filter, whileits second derivative, the Laplacian, emphasises edges whensubtracted from the original. Therefore, the following modelfor the filter learned was set up:

f(x,y) = c1

1√2ps1

exp S2x2 1 y2

2s21D −

c2

4(x2 1 y2) 2 s22

√2ps52

exp S2x2 1 y2

2s22

D (10)

in which c1 and s1 are parameters to be estimated for theGaussian and c2 and s2 are parameters for the Laplacian.Figure 17(c) shows these two functions.

A Gauss–Newton fitting procedure was used to find theparameters of f(x, y) given the weight matrix shown in Fig.17(a). The resulting model weight matrix is shown in Fig.17(b) and a cross-section is shown in Fig. 17(c). Althoughthe fit (c1 = 1.41, s1 = 2.77, c2 = 1.38, s2 = 0.99) is not per-fect, the correlation between the model and the actualweights is quite high (0.93).

Our hypothesis was that this solution, i.e. applying aGaussian and a Laplacian, was a local minimum which mostof the networks had converged to. The same procedure wasapplied to each of the hidden units in each of our one-hidden layer one-shot networks. For each unit, the modelwas fitted and the MSE and correlation between the actualweight matrix and the model weight matrix were calculated.

8.1. Results

The results, given in Fig. 18, show that, at least for thesmaller networks, our hypothesis is supported by the data.For the networks trained on the normal data set, in a largerange of neural network sizes (i.e. 1, 2, 3, 4, 5, 10 and 25hidden units) the model closely fits each hidden unit. Onlyfor larger neural network sizes the fit becomes worse. Thereason for this is that in these networks many units have aweight distribution which is very hard to interpret, yet withsmall weights. These units do not play a large role in thefinal neural network output.

For the networks trained on the edge-favouring data setthe fit is less good, but still gives a reasonable correlation.Unfortunately, these results do not match the peak whichoccurred in network performance according to our newperformance measure. An opposite effect is playing a rolehere, that is that the larger the networks, the harder totrain they become.

9. CONCLUSIONS

A number of experiments on implementing a basic non-linear filter have been presented. Since this filter, the Kuwa-hara filter for edge-preserving smoothing, is of a modular

Page 16: The Applicability of Neural Networks to Non-linear Image … faculteit/Decaan... · suffice. These applications, such as seismic image analysis, fingerprint analysis or medical

126 D. de Ridder et al.

Fig. 18. A comparison between the actual weights in one-shot one-shot networks and the fitted models, for both one and two hiddenlayer networks: the median MSE ((a), (c)) and the mean correlation((b), (d)). The median MSE is shown in (a) and (c), since themean MSE is rather uninformative due to some high deviations.

algorithmic nature, modular versions of the network couldbe constructed as well as standard feed-forward networks. Agradual shift in network performance was expected as thenetworks were less and less constrained, at the same timelosing the possibility of understanding the workings of thenetworks. A first set of experiments was reported, whichshowed that whatever network was trained, performance asmeasured in the MSE was more or less the same. A numberof hypotheses was proposed for this phenomenon: that thedata set and error measure may not accurately represent thefiner points of this particular problem or that all networkshave reached local minima, simply since the problem is toohard. Testing these hypotheses, it was shown that:

I using a different way of sampling the images, i.e. mainlyin regions around the edges, proves to be of great benefit;

I using a performance measure which does not average overall pixels, but takes the two goals of edge-preservingsmoothing into account, gives a better insight into relativefilter performance;

I the smaller one-shot networks have learned a linearapproximation of the Kuwahara filter’s goals, i.e. theyhave reached a local minimum;

I in the poorly performing modular networks, the modulesstill perform the functions they were trained on. The

better performing modular networks retain some of theirinitialisation, but have adapted further to a point thatthe function of individual modules is no longer clear. Thebetter the performance of the final network (accordingto the new measure) the less clear the initialisationis retained.

In the attempts to try to understand the operation of aneural network instead of treating it like a black box, theuncertainty principle again played a role. For the modularnetworks, as soon as some of the constraints were dropped,network performance became much worse: there was nograceful degradation. It was also shown that it is hard tointerpret the operation of the modular network after trainingit a while further; the operation of the network is distributeddifferently than in our modular initialisation. The one thingwe can say, is that using the prior knowledge of the modularnature of the problem helps to avoid pain-staking optimis-ation of the number of hidden layers and units, which wasshown to be quite critical in the one-shot networks.

The most important lesson is that a straight-forward appli-cation of neural networks to this kind of problem can givegood results. However, careful use of prior knowledge, selec-tion of network architecture and sampling of the trainingset are prerequisites for good operation. In addition, thestandard error measure used, the MSE, will not indicate anetwork performing poorly. Unimportant deviations in theoutput image may lead to the same MSE as significant ones,if there is a large number of unimportant deviations and asmaller number of important ones. Consequently, standardfeed-forward neural networks trained by minimising the tra-ditional MSE are unfit for designing adaptive non-linearimage filtering operations; other criteria should be developedto facilitate easy application of neural networks in this field.

9.1. Further Research

On our proposed solution to the training set samplingproblem, edge-favouring sampling, some variation is possible.Pugmire et al [3] claim that learning should be structured,i.e. start with the general problem and then proceed tospecial cases. This can be easily accomplished with oursampling method, by adding a constant to each pixel in thegradient magnitude image before scaling and use as a prob-ability density function. If this constant is gradually lowered,edge-pixels become better represented in the training set.

The performance measure for edge-preserving smoothingintroduced in this paper seems to capture the essence ofthe operation well. The measure should be benchmarked byapplying it to different filters on a wide range of images.Furthermore, the relationship with human judgement couldbe investigated by conducting psychological experiments;hopefully, areas of equal perception of image quality can befound in the space spanned by the smoothing and sharpeningmeasures. Perhaps the measure can even be used in trainingthe neural network, although the very non-linear naturewhich makes it work seems to make this difficult; one wouldhave to look at other learning procedures, e.g. a form ofreinforcement learning [31,32].

Page 17: The Applicability of Neural Networks to Non-linear Image … faculteit/Decaan... · suffice. These applications, such as seismic image analysis, fingerprint analysis or medical

127The Applicability of Neural Networks to Non-linear Image Processing

Finally, although all results shown in this paper suggestthat neural networks perform poorly in edge-preservingsmoothing, the perceptual quality of the resulting filteredimages is quite good. Perhaps it is the very fact that theseneural networks have only partially succeeded in capturingthe non-linearity of the Kuwahara filter that causes this.This could be construed as an advantage: constrained non-linear parametric approximations to highly non-linear fil-tering algorithms may give better perceptual results than thereal thing, which is, after all, only a means to an end.

Acknowledgements

This research is partly supported by the Foundation forComputer Science in the Netherlands (SION), the DutchOrganisation for Scientific Research (NWO) and the RoyalNetherlands Academy of Arts and Sciences (KNAW).

References

1. Hornik K. Multilayer feedforward networks are universal approx-imators. Neural Networks 1989; 2: 359–366

2. Spreeuwers LJ. Image filtering with neural networks, applicationsand performance evaluation. PhD thesis, Universiteit Twente,Enschede, October 1992

3. Pugmire RH, Hodgson RM, Chaplin RI. The properties andtraining of a neural network based universal window filterdeveloped for image processing tasks. In Amari S, Kasabov N(eds), Brain-like Computing and Intelligent Information Sys-tems, Chapter 3, Springer-Verlag, Singapore, 1998, pp 49–77

4. Shih F, Moh J. Implementing morphological operations usingprogrammable neural networks. Pattern Recognition 1992;25(1): 89–99

5. Yin L, Astola J, Neuvo Y. A new class of nonlinear filters–neural filters. IEEE Transactions on Signal Processing 1993;41(3): 1201–1222

6. Jahn H. A neural network for image smoothing and segmen-tation. In Amin A, Dori D, Pudil P, Freeman H (eds), Advancesin Pattern Recognition, Proceedings of the IAPR workshops onStructural and Statistical Pattern Recognition ’98 (SSPR’98)and Statistical Techniques in Pattern Recognition ’98 (SPR’98),IAPR, Springer-Verlag, Berlin, 1998, pp 329–338

7. Perona P, Shiota T, Malik J. Anisotropic diffusion. In terHaar Romeny B. (ed), Geometry-driven Diffusion in ComputerVision, Kluwer Academic, Dordrecht, 1994, pp 73–92

8. Fukushima K, Miyake S. Neocognitron: a new algorithm forpattern recognition tolerant of deformations and shifts in pos-ition. Pattern Recognition 1982; 15(6): 455–469

9. Le Cun Y, Boser B, Denker JS, Henderson D, Howard RE,Hubbard W, Jackel LD. Backpropagation applied to handwrittenzip code recognition. Neural Computation 1989; 1: 541–551

10. Luo F-L, Unbehauen R. Applied Neural Networks for SignalProcessing. Cambridge University Press, Cambridge, 1997

11. Heidemann G, Ritter H. A neural 3-D object recognitionarchitecture using optimized Gabor filters. Proceedings of the13th IAPR International Conference on Pattern Recognition,Vol. IV, Los Alamitos, CA. IAPR, IEEE Press, 1996, p 70

12. Haring B. Adaptive image segmentation. PhD thesis, Universit-eit Utrecht, Utrecht, 1997

13. de Ridder D, Schutte K, and Schwering P. Vehicle recognitionin infrared images using shared weights neural networks. OpticalEngineering 1998; 37(3): 847–857

14. Kohonen T. Self-Organizing Maps. Springer-Verlag, Heidelberg1995

15. Rao RPN, Ballard DH. Efficient encoding of natural time varyingimages produces oriented space-time receptive fields. TechnicalReport 97.4, National Resource Laboratory for the Study ofBrain and Behaviour, Department of Computer Science, Univer-sity of Rochester, NY, August 1997

16. Obellianne C, Fogelman Soulie F, Galibourg G. Connectionistmodels for image processing. In Simon JC (ed), From Pixels toFeatures, A Workshop held at Bonas, France 22–27 August1988. North-Holland, Amsterdam, 1988, pp 185–196

17. Kuwahara M, Hachimura K, Eiho S, Kinoshita M. DigitalProcessing of Biomedical Images, Plenum Press, New York, 1976,pp 187–203

18. Anand R, Mehrotra K, Mohan CK, Ranka S. Efficient classi-fication for multiclass problems using modular neural networks.IEEE Transactions on Neural Networks 1995; 6(1): 117–124

19. Nagao M, Matsuyama T. A Structural Analysis of ComplexAerial Photographs. Plenum Press, New York, NY, 1980

20. Tomita F, Tsuji S. Extraction of multiple regions by smoothingin selected neighbourhoods. IEEE Transactions on Systems, Manand Cybernetics 1977; SMC-7: 107–109

21. Bishop CM. Neural Networks for Pattern Recognition. OxfordUniversity Press, Oxford, 1995

22. Hertz J, Krogh A, Palmer RG. Introduction to the Theory ofNeural Computation. Addison-Wesley, Reading, MA, 1991

23. Green CD. Are connectionist models theories of cognition?Psycologuy 1998; 9(4)

24. de Ridder D. Shared weights neural networks in image analysis.Master’s thesis, Delft University of Technology, March 1996.(Download from http://www.ph.tn.tudelft.nl/˜dick/papers.html.)

25. de Ridder D, Hoekstra A, Duin RPW. Feature extraction inshared weights neural networks. In Kerckhoffs EJH et al. (eds),Proceedings of the 2nd Annual Conference of the AdvancedSchool for Computing and Imaging (ASCI), Lommel, Belgium,June 5–7 1996, pp 289–294

26. Geman S, Bienenstock E, Doursat R. Neural networks and thebias-variance dilemma. Neural Computation 1992; 4(1): 1–58

27. Rumelhart DE, Hinton GE, Williams RJ. Learning internalrepresentations by error propagation. In Rumelhart DE, McClel-land JL (eds), Parallel Distributed Processing: Explorations inthe Microstructure of Cognition, Vol I. MIT Press, Cambridge,MA, 1986

28. Pratt WK. Digital Image Processing. Wiley, New York, 199129. Katsulai H, Arimizu N. Evaluation of image fidelity by means

of the fidelogram and level mean-square error. IEEE Transactionson Pattern Analysis and Machine Intelligence 1981; 3(3):337–347

30. Press WH, Flannery BP, Taukolsky SA, Vetterling WT. Numeri-cal Recipes in C. Cambridge University Press, Cambridge, 1988

31. Williams RJ. Simple statistical gradient-following algorithms forconnectionist reinforcement learning. Machine Learning 1992;8: 229–256

32. Gullapalli V. A stochastic reinforcement learning algorithm forlearning real-valued functions. Neural Networks 1990; 3(6):671–692

Dick de Ridder received his MSc degree in 1996 from the Department ofComputer Science of the Delft University of Technology, The Netherlands, wherehe is currently a PhD student in the Pattern Recognition Group at the Departmentof Applied Physics. His research activities include statistical pattern recognition,image processing and in particular the application of neural network techniquesin the field on non-linear image processing.

Robert P.W. Duin studied applied physics at Delft University of Technology inthe Netherlands. In 1978 he received his PhD degree for a thesis on the accuracyof statistical pattern recognisers. In his research he included various aspects of

Page 18: The Applicability of Neural Networks to Non-linear Image … faculteit/Decaan... · suffice. These applications, such as seismic image analysis, fingerprint analysis or medical

128 D. de Ridder et al.

the automatic interpretation of measurements, learning systems and classifiers.Between 1980 and 1990 he developed and studied hardware architectures andsoftware configurations for interactive image analysis. At present he is with thePattern Recognition Group, in the Department of Applied Physics at the DelftUniversity of Technology. His main research interest is in the design andevaluation of learning algorithms for pattern recognition applications. Thisincludes in particular neural network classifiers, support vector classifiers andclassifier combining strategies.

Lucas J. van Vliet studied applied physics at the Delft University of Technologyin The Netherlands. His PhD thesis (cum laude, 1993) entitled ‘Grey-scalemeasurements in multi-dimensional digitized images’ presents novel methods forsampling-error free measurements of geometric object features. He has worked onvarious sensor, restoration and measurement problems in quantitative microscopy.He is currently with the Pattern Recognition Group of the Department of AppliedPhysics, at the Delft University of Technology. His current research interestsinclude segmentation and analysis of objects, textures and structures in digitizedimages.

Piet W. Verbeek studied physics at Leyden University, The Netherlands. HisPhD thesis (1973) was on quantum statistics and systems theory of magneticrelaxation. In 1973 he joined the Pattern Recognition Group of the Departmentof Applied Physics at the Delft University of Technology. Since 1974 he hasworked on image processing. Some topics: 3D skeletonisation (1978), cell nucleustexture analysis (1979), texture segmentation (1980), alpha-hull (1981), videospeed range sensor system (since 1985), max-min filtering (1988), distance trans-form and robot collision avoidance (1986–91), measurement in 2D and 3D greyimages (since 1985), texture analysis in 2D and 3D grey images (since 1994).

Correspondence and offprint requests to: Dick de Ridder, Pattern RecognitionGroup, Applied Physics Department, Delft University of Technology, Lor-entzweg 1, 2628 Delft, The Netherlands. Email: dickKph.tn.tudelft.nl


Recommended