+ All Categories
Home > Technology > Behaviour-based Clustering of Neural Networks applied to Document Enhancement

Behaviour-based Clustering of Neural Networks applied to Document Enhancement

Date post: 15-Jul-2015
Category:
Upload: francisco-zamora-martinez
View: 383 times
Download: 1 times
Share this document with a friend
Popular Tags:
24
Behaviour-based Clustering of Neural Networks applied to Document Enhancement F. Zamora Martínez, S. España Boquera, M.J. Castro Bleda Dep. Sistemas Informáticos y Computación Universidad Politécnica de Valencia, Spain 20-22 June 2007, San Sebastián, Spain 0 This work has been partially supported by the Spanish Government under contract TIN2006-12767 and by the Generalitat Valenciana under contract GVA06/302. F. Zamora, S. España, M.J. Castro (UPV) IWANN 2007 20-22 June 2007, San Sebastián 1/ 24
Transcript
Page 1: Behaviour-based Clustering of Neural Networks applied to Document Enhancement

Behaviour-based Clustering of Neural Networksapplied to Document Enhancement

F. Zamora Martínez, S. España Boquera, M.J. Castro Bleda

Dep. Sistemas Informáticos y ComputaciónUniversidad Politécnica de Valencia, Spain

20-22 June 2007, San Sebastián, Spain

0This work has been partially supported by the Spanish Government undercontract TIN2006-12767 and by the Generalitat Valenciana under contractGVA06/302.

F. Zamora, S. España, M.J. Castro (UPV) IWANN 200720-22 June 2007, San Sebastián 1 /

24

Page 2: Behaviour-based Clustering of Neural Networks applied to Document Enhancement

Index

1 Abstract

2 Motivation

3 Behaviour-based Clustering of Supervised ClassifiersAgglomerative Hierarchical ClusteringBehaviour-based Clustering

4 An Application of Behaviour-based Clustering of MLPs toDocument Enhancement

5 Experimentation“Noisy Office”: Simulated Noisy Image DatasetAgglomerative Hierarchical Clustering of MLPsTraining a classifier for the N types of neural clustered filtersEvaluation of the Enhancement System

6 Summary and Conclusions

F. Zamora, S. España, M.J. Castro (UPV) IWANN 200720-22 June 2007, San Sebastián 2 /

24

Page 3: Behaviour-based Clustering of Neural Networks applied to Document Enhancement

Abstract

This work proposes an agglomerative hierarchical clusteringalgorithm where

the items to be clustered are MLPs, andthe measure of similarity to compare the MLPs is based on theirbehaviour.

This algorithm has been applied to document enhancement.

HOW:1 A set of MLPs is TRAINED as noise-filters for different

types of noise and then CLUSTERED into groups toobtain a reduced set of filters.

2 In order to SELECT which clustered filter is the mostsuitable to clean and enhance a real noisy image, animage classifier is also trained using multilayerperceptrons.

F. Zamora, S. España, M.J. Castro (UPV) IWANN 200720-22 June 2007, San Sebastián 3 /

24

Page 4: Behaviour-based Clustering of Neural Networks applied to Document Enhancement

Index

1 Abstract

2 Motivation

3 Behaviour-based Clustering of Supervised ClassifiersAgglomerative Hierarchical ClusteringBehaviour-based Clustering

4 An Application of Behaviour-based Clustering of MLPs toDocument Enhancement

5 Experimentation“Noisy Office”: Simulated Noisy Image DatasetAgglomerative Hierarchical Clustering of MLPsTraining a classifier for the N types of neural clustered filtersEvaluation of the Enhancement System

6 Summary and Conclusions

F. Zamora, S. España, M.J. Castro (UPV) IWANN 200720-22 June 2007, San Sebastián 4 /

24

Page 5: Behaviour-based Clustering of Neural Networks applied to Document Enhancement

Motivation

The field of off-line optical character recognition (OCR) has beena topic of intensive research for many years.

Document enhancement

not only influences the overall performance of OCR systems, but

it can also significantly improve document readability for humanreaders.

F. Zamora, S. España, M.J. Castro (UPV) IWANN 200720-22 June 2007, San Sebastián 5 /

24

Page 6: Behaviour-based Clustering of Neural Networks applied to Document Enhancement

PROBLEM: In many cases, the noise of document images isheterogeneous, and a technique fitted for one type of noise may not bevalid for the overall set of documents.SOLUTION: To use several filters or techniques and to provide aclassifier to select the appropriate one.

Neural networks have been used for document enhancement.One advantage of neural network filters for image enhancementand denoising is that a different neural filter can be trained foreach type of noise.This work proposes the clustering of neural network filters to avoidhaving to label training data and to reduce the number of filtersneeded by the enhancement system. An agglomerativehierarchical clustering algorithm of supervised classifiers isproposed to do this.

F. Zamora, S. España, M.J. Castro (UPV) IWANN 200720-22 June 2007, San Sebastián 6 /

24

Page 7: Behaviour-based Clustering of Neural Networks applied to Document Enhancement

Index

1 Abstract

2 Motivation

3 Behaviour-based Clustering of Supervised ClassifiersAgglomerative Hierarchical ClusteringBehaviour-based Clustering

4 An Application of Behaviour-based Clustering of MLPs toDocument Enhancement

5 Experimentation“Noisy Office”: Simulated Noisy Image DatasetAgglomerative Hierarchical Clustering of MLPsTraining a classifier for the N types of neural clustered filtersEvaluation of the Enhancement System

6 Summary and Conclusions

F. Zamora, S. España, M.J. Castro (UPV) IWANN 200720-22 June 2007, San Sebastián 7 /

24

Page 8: Behaviour-based Clustering of Neural Networks applied to Document Enhancement

Agglomerative Hierarchical Clustering

Agglomerative hierarchical clustering:

it makes very few assumptions about the data.it constructs a hierarchical structure by iteratively merging clustersaccording to certain dissimilarity measure, starting from singletonsuntil no further merging is possible (one general cluster).

C D EBA

Dis

sim

ilar

ity

D+E

A+B

A+B+C

A+B+C+D+E

F. Zamora, S. España, M.J. Castro (UPV) IWANN 200720-22 June 2007, San Sebastián 8 /

24

Page 9: Behaviour-based Clustering of Neural Networks applied to Document Enhancement

Agglomerative Hierarchical Clustering: The algorithm

1 Initialization: M singletons as M clusters.

2 Compute the dissimilarity distances between every pair ofclusters.

3 Iterative process:a) Determine the closest pair of clusters i and j .b) Merge the two closest clusters selected in (a) into a new cluster

i + j .c) Update the dissimilarity distances from the new cluster i + j to all the

other clusters.d) If more than one cluster remains, go to step (a).

4 Select the number N of clusters for a given criterion .

F. Zamora, S. España, M.J. Castro (UPV) IWANN 200720-22 June 2007, San Sebastián 9 /

24

Page 10: Behaviour-based Clustering of Neural Networks applied to Document Enhancement

Behaviour-based Clustering

In this case: Items=Supervised classifiers=MLPs

The dissimilarity distance between two clusters can be based onthe behaviour of the classifiers with respect to a validation dataset.

To merge the closest pair of clusters, a new classifier is trainedwith the associated training data of the two merged clusters.

F. Zamora, S. España, M.J. Castro (UPV) IWANN 200720-22 June 2007, San Sebastián 10 /

24

Page 11: Behaviour-based Clustering of Neural Networks applied to Document Enhancement

Index

1 Abstract

2 Motivation

3 Behaviour-based Clustering of Supervised ClassifiersAgglomerative Hierarchical ClusteringBehaviour-based Clustering

4 An Application of Behaviour-based Clustering of MLPs toDocument Enhancement

5 Experimentation“Noisy Office”: Simulated Noisy Image DatasetAgglomerative Hierarchical Clustering of MLPsTraining a classifier for the N types of neural clustered filtersEvaluation of the Enhancement System

6 Summary and Conclusions

F. Zamora, S. España, M.J. Castro (UPV) IWANN 200720-22 June 2007, San Sebastián 11 /

24

Page 12: Behaviour-based Clustering of Neural Networks applied to Document Enhancement

An Application: Document Enhancement

Architecture of the artificial neural network to enhance and cleanimages. The entire image is cleaned by scanning it with the neuralnetwork.

F. Zamora, S. España, M.J. Castro (UPV) IWANN 200720-22 June 2007, San Sebastián 12 /

24

Page 13: Behaviour-based Clustering of Neural Networks applied to Document Enhancement

The whole process

1 Obtain N neural clustered filters and N associated groups ofimages by hierarchical clustering.

a) Given a set of M unclassified pairs of noisy and clean images, aspecific neural filter is trained for every image.

b) Apply the Generic Agglomerative Hierarchical Clustering algorithmfor MLPs.

2 Obtain a classifier for the N types of neural clustered filters.Once the number of neural filters is selected, a filter classifier isneeded to select the appropriate filter to clean and enhance a newimage→ MLP classification.

3 Denoise and enhance a real noisy image. Finally, when a realnoisy image is to be cleaned, a clustered filter must be selectedwith the filter classifier and then applied to the image.

F. Zamora, S. España, M.J. Castro (UPV) IWANN 200720-22 June 2007, San Sebastián 13 /

24

Page 14: Behaviour-based Clustering of Neural Networks applied to Document Enhancement

Index

1 Abstract

2 Motivation

3 Behaviour-based Clustering of Supervised ClassifiersAgglomerative Hierarchical ClusteringBehaviour-based Clustering

4 An Application of Behaviour-based Clustering of MLPs toDocument Enhancement

5 Experimentation“Noisy Office”: Simulated Noisy Image DatasetAgglomerative Hierarchical Clustering of MLPsTraining a classifier for the N types of neural clustered filtersEvaluation of the Enhancement System

6 Summary and Conclusions

F. Zamora, S. España, M.J. Castro (UPV) IWANN 200720-22 June 2007, San Sebastián 14 /

24

Page 15: Behaviour-based Clustering of Neural Networks applied to Document Enhancement

Real “Noisy Office” dataset

Database of printed documents with typical noises from an office:72 types of noisy documents crossing the following parameters:

type of noise (folded sheets, wrinkled sheets, coffee stains andfootprints),font type (typewriter, serif, roman),yes/no emphasized font, and font size (footnote size, normal,large.

F. Zamora, S. España, M.J. Castro (UPV) IWANN 200720-22 June 2007, San Sebastián 15 /

24

Page 16: Behaviour-based Clustering of Neural Networks applied to Document Enhancement

Simulated “Noisy Office” dataset

The filtering process is based on MLPs that require a corpus of trainingpairs “(clean image, noisy image)”. It is much easier to obtain asimulated noisy image from a clean one than to clean noisy images orestimate a document degradation model.

Example: Simulated noisy process for “coffee-noise”.

F. Zamora, S. España, M.J. Castro (UPV) IWANN 200720-22 June 2007, San Sebastián 16 /

24

Page 17: Behaviour-based Clustering of Neural Networks applied to Document Enhancement

Simulated “Noisy Office” dataset (II)

Example: Simulated noisy process for “wrinkle-noise”.

F. Zamora, S. España, M.J. Castro (UPV) IWANN 200720-22 June 2007, San Sebastián 17 /

24

Page 18: Behaviour-based Clustering of Neural Networks applied to Document Enhancement

Agglomerative Hierarchical Clustering of MLPs

1 Initialization. Each initial singleton is a trained MLP specific filterfor each type of noise (we started with M = 72 types of filters).

2 The dissimilarity distance is the distance between the imagescleaned by two filters (weighted euclidean distance).

3 To merge the closest pair of filters, a new MLP was trained withthe associated training data of the two merged clusters.

4 To select the number of clusters→ looking for points where anabrupt growth in the dissimilarity distance occurred.

F. Zamora, S. España, M.J. Castro (UPV) IWANN 200720-22 June 2007, San Sebastián 18 /

24

Page 19: Behaviour-based Clustering of Neural Networks applied to Document Enhancement

(a) (b)(a) Dissimilarity distance between the closest pair of clustersthroughout the clustering process. (b) Dissimilarity distance between avalidation image dataset cleaned with the specific filter and the sameimages cleaned with the true-class neural clustered filter.

F. Zamora, S. España, M.J. Castro (UPV) IWANN 200720-22 June 2007, San Sebastián 19 /

24

Page 20: Behaviour-based Clustering of Neural Networks applied to Document Enhancement

Texture classification: Training a classifier for the Ntypes of neural clustered filters

A classifier to select the neural clustered filter that is the most suitableto enhance a given noisy image→ MLP classifier.

A MLP that estimates the posterior probability of the cluster classgiven a fixed dimension square of pixels (from real noisy images)was trained.The input was fixed to 21× 21 pixels. The output layer wascomposed of M units corresponding to the M neural clusteredfilters.The classifier was applied one portion at a time to the entire noisyimage, and the estimates of all these portions were averaged inorder to choose the most probable neural clustered filter.

F. Zamora, S. España, M.J. Castro (UPV) IWANN 200720-22 June 2007, San Sebastián 20 /

24

Page 21: Behaviour-based Clustering of Neural Networks applied to Document Enhancement

Evaluation of the Enhancement System

Specific neural filter vs. Our system: Distance = 37.881 Clean each real noisy image with its specific neural filter→

“reference” cleaned images.2 Compare the “reference” cleaned images with the output of the

proposed enhancement system→ calculate euclidean distance.General neural filter vs. Our system: Distance = 62.46

1 One general neural filter: One MLP for all types of noise.2 Compare the “reference” cleaned images with the output of the

same images cleaned with this general neural filter→ calculateeuclidean distance.

General neural filter vs. Our system with no classification errors:Distance = 28.92

1 This simulation reduced the average distance from 37.88 to 28.92.

F. Zamora, S. España, M.J. Castro (UPV) IWANN 200720-22 June 2007, San Sebastián 21 /

24

Page 22: Behaviour-based Clustering of Neural Networks applied to Document Enhancement

An example of the enhancement and cleaning process. (a) Originalreal noisy image. (b) Result of applying a neural filter trained with alltypes of noise. (c) Result of applying the proposed neural clusteredfilter. (d) Result of applying the neural filter trained with only that typeof noise.

F. Zamora, S. España, M.J. Castro (UPV) IWANN 200720-22 June 2007, San Sebastián 22 /

24

Page 23: Behaviour-based Clustering of Neural Networks applied to Document Enhancement

Index

1 Abstract

2 Motivation

3 Behaviour-based Clustering of Supervised ClassifiersAgglomerative Hierarchical ClusteringBehaviour-based Clustering

4 An Application of Behaviour-based Clustering of MLPs toDocument Enhancement

5 Experimentation“Noisy Office”: Simulated Noisy Image DatasetAgglomerative Hierarchical Clustering of MLPsTraining a classifier for the N types of neural clustered filtersEvaluation of the Enhancement System

6 Summary and Conclusions

F. Zamora, S. España, M.J. Castro (UPV) IWANN 200720-22 June 2007, San Sebastián 23 /

24

Page 24: Behaviour-based Clustering of Neural Networks applied to Document Enhancement

Summary and Conclusions

An agglomerative hierarchical clustering of supervised-learningclassifiers that uses a measure of similarity among classifiersbased on their behaviour on a validation dataset has beenproposed.As an application of this clustering procedure, we have designedan enhancement system for document images using neuralnetwork filters.Both objective and subjective evaluations of the cleaning methodshow excellent results in cleaning noisy documents.

This method could also be used to clean and restore other typesof images: noisy backgrounds in scanned documents, foldeddocuments, stained paper of historical documents, vehicle licenserecognition, etc.As inmediate future work: OCR performance for real andenhanced images using our proposed system and otherenhancement filters.

F. Zamora, S. España, M.J. Castro (UPV) IWANN 200720-22 June 2007, San Sebastián 24 /

24


Recommended