Hierarchical Manifold Sensing with Foveation and Adaptive...

Journal of Imaging Science and Technology Rcopy 60(2) 020402-1ndash020402-10 2016ccopy Society for Imaging Science and Technology 2016

Hierarchical Manifold Sensing with Foveation andAdaptive Partitioning of the Dataset

Irina Burciu Thomas Martinetz and Erhardt BarthUniversity of Luumlbeck Institute for Neuro- and Bioinformatics Ratzeburger Allee 160 D-23562 Luumlbeck Germany

E-mail irinaburciuinbuni-luebeckde

Abstract The authors present a novel method Hierarchical ManifoldSensing for adaptive and efficient visual sensing As opposed tothe previously introduced Manifold Sensing algorithm the newversion introduces a way of learning a hierarchical partitioning of thedataset based on k-means clustering The algorithm can performon whole images but also on a foveated dataset where only salientregions are sensed The authors evaluate the proposed algorithmson the COIL ALOI and MNIST datasets Although they use avery simple nearest-neighbor classifier on the easier benchmarksCOIL and ALOI perfect recognition is possible with only six or tensensing values Moreover they show that their sensing schemeyields a better recognition performance than compressive sensingwith random projections On MNIST state-of-the-art performancecannot be reached but they show that a large number of test imagescan be recognized with only very few sensing values However formany applications performance on challenging benchmarks may beless relevant than the simplicity of the solution (processing powerbandwidth) when solving a less challenging problem ccopy 2016Society for Imaging Science and Technology[DOI 102352JImagingSciTechnol2016602020402]

INTRODUCTIONWe present a novel method called Hierarchical ManifoldSensing (HMS) The objective is to develop appropriatesensing algorithms such as to increase the efficiency of visualsensing by adopting adaptive sensing strategies The algo-rithms can also be used for resampling and compressionbefore transmitting a densely sampled signal over a low-bandwidth channel In other words we address the questionof how to efficiently sample the visual world under the con-straint of a limited bandwidth For example the bandwidthis limited in human vision by the capacity of the optic nerveand in technical systems by the performance and cost ofhardware As opposed to classical sensing and compressionschemes HMS is based on unsupervised learning whichinvolves a hierarchical partitioning of the dataset

Hierarchical Manifold Sensing is inspired by Compres-sive Sensing (CS)1 Compressive Sensing is based on the factthat natural images can be encoded sparsely2 and thus thenumber of samples used for representing an image accuratelycan be reduced by sensing with a random matrix3 Asopposed to classical sampling with CS each acquired sensingvalue is a weighted sum of the original unknown signalHierarchicalManifold Sensing works in a similar way ie CS

Received June 30 2015 accepted for publication Nov 8 2015 publishedonline Dec 10 2015 Associate Editor Chunghui Kuo1062-3701201660(2)02040210$2500

and HMS both make use of a sensing matrix As opposed toCS where the sensing matrix does not depend on the senseddata HMS introduces a two-fold adaptivity (i) the sensingalgorithm adapts to a particular dataset and (ii) everynew sensing value depends on the already acquired sensingvalues Thus sensing in HMS is performed adaptively withoptimized weights and not randomly as in CS

Schuumltze et al4 presented an alternative adaptive hi-erarchical sensing (AHS) scheme for efficiently obtainingthe sparse coefficients of an image The sensing process isperformed by partially traversing a binary tree and makinga measurement at each visited node The method is adaptivein the sense that after each sensing action depending on howmuch gain the sensing operation brings it is decidedwhetherthe entire subtree of the current node is further traversedor whether it is omitted Adaptive hierarchical sensing wasapplied on patches of natural images and it was shown thatthe performance of themethod can be improved by choosingan appropriate sparse coding basis and by properly arrangingthe AHS tree The results of the method strongly depend onthe decision step where a threshold is compared with themeasurement values corresponding to the binary tree

Baraniuk presented a theoretical analysis of CS formanifolds in 20095 and showed that similarly to the theoryof CS only a small number of random linear projectionsis sufficient to preserve the key information on a signalmodeled by a manifold Later Chen et al6 proposed astatistical framework for CS on manifolds Their articlepresents a nonparametric hierarchical Bayesian algorithmthat learns a mixture of factor analyzers for manifolds basedon the training data Afterwards the signal is reconstructedusing a limited number of random projections The methodis validated on synthetic and on real datasets but it isevaluated only for a subset of the MNIST database

The Manifold Sensing concept was introduced beforeby Burciu et al as visual Manifold Sensing (MS)7 and itwas extended afterwards to the foveated version of ManifoldSensing (FMS)8 which was inspired by the samplingstrategy of biological systems Like Manifold Sensing HMSis based on a geometric approach Both MS and FMSare based on learning manifolds of increasing but lowdimensionality by using a nonlinear algorithm namelyLocally Linear Embedding (LLE)9While sensing the datasetis continuously adapted and the corresponding embedding islearned As a further and optional feature HMS can involvefoveation as in the FMS approach

J Imaging Sci Technol 020402-1 Mar-Apr 2016

Burciu Martinetz and Barth Hierarchical Manifold Sensing with foveation and adaptive partitioning of the dataset

Both MS and FMS strongly depend on the choice ofthe following parameters (i) the number of neighbors usedfor LLE (ii) the decreasing sizes of the adaptive datasetand (iii) the dimensions of the manifolds at each iterationof the algorithm Moreover MS and FMS are operating onthe entire dataset while sensing As highlighted in the FMSarticle8 in a real-time sensing scenario one would needto learn all of the manifolds corresponding to all possiblesubsets of data before performing the actual sensing In thisarticle we therefore provide an extended version of MS andFMS which includes an adequate partitioning learned priorto sensing

Hierarchical Manifold Sensing as used in this articleis also based on learning manifolds of different and lowdimensionality However for simplicity we here use a linearmethod Principal Component Analysis (PCA) to learn thelow-dimensional representations of the foveated datasetThe hierarchical partitioning of the dataset is performed byclustering the data in the low-dimensional manifolds usingthe k-means algorithm

Several approaches aim at developing efficient clusteringmethods for high-dimensional data see for example Ref 10In this work we focus on solving the sensing problem and noton optimizing the approach for hierarchical partitioning ofthe data Therefore we just combine two simple approachesPCA for dimensionality reduction and k-means for clustering(k-means++ implementation11)

LikeMS and FMSHMS is optimized and evaluatedwithrespect to particular recognition tasks and not with respect tothe reconstruction error

In the following section we present the HierarchicalManifold Sensing method we first explain how the foveateddataset is created we then present in detail the steps ofhierarchical partitioning of the dataset and we finally showhow the unknown scenes are sensed in a hierarchical wayAfter that we present the results of this work and conclusions

HIERARCHICALMANIFOLD SENSINGHierarchical Manifold Sensing (HMS) is based on a geomet-ric approach to the problem of efficient sensing A particulartype of environment is represented by the images I i in adatasetD= I1 Ip with p data points of dimension DIn the foveated version of HMS which is considered herethe dataset D is first transformed into a foveated datasetDfoveated that contains only regions of interest out of theoriginal dataset

The goal is to learn efficient features for classificationThis problem is however not approached by just unsu-pervised learning on the whole dataset Dfoveated Insteada tree structure that involves a hierarchical partitioning ofthe dataset is learned The resulting partitioning is used tosolve the sensing problem more efficiently ie to use as fewsensing actions as possible in order to sense and classify anunknown scene or object

In the following subsections we first review the proce-dure of creating the foveated dataset which was presentedin more detail in Ref 8 Next we describe the approach for

the hierarchical partitioning of the dataset These two stepsdefine the offline part of the HMS algorithm After we havelearned the foveated hierarchical representation of the givendataset we can project on it an unknown scene ie a testpoint outside Dfoveated that we wish to sense HierarchicalManifold Sensing thus includes the following main steps

Creating a Foveated Dataset based on a dataset contain-ing images of known scenesHierarchical Partitioning of the DatasetHierarchical Sensing of Unknown Scenes (here imple-mented by resampling of unknown test images)

Creating a Foveated DatasetThe foveated dataset Dfoveated contains only the pixels thatare salient on average over the dataset Although these pixelsdo not necessarily form a compact region of interest (ROI)we will denote the collection of salient pixels as the ROIThe ROI is extracted by using a saliency model based onthe geometric invariants of the structure tensor of the imagesin the dataset D The invariants of the structure tensor areknown to be good predictors of human eye movements forstatic scenes12 In Ref 12 the properties of the image regionsselected by the saccadic eye movements during experimentswere analyzed in terms of higher-order statistics It wasshown that image regions with a statistically less redundantstructure such as the ones given by the signals with intrinsicdimension two contain all the necessary information of astatic scene Therefore signals with intrinsic dimension twoare considered to be more salient The intrinsic dimension(iD) refers to the relation between the degrees of freedom ofa signal domain and the actual degrees of freedom used bya signal Thus signals with i0D are constant within a localwindow signals with i1D can be approximated by a functionof only one variable (eg straight lines edges) and signalswith i2D are for example corners curved lines junctionsand curved edges In this approach we use the geometricinvariant S (determinant) for which the regions of an imagewith S nonzero are i2D

Algorithm 1 sketches the steps of creating the foveateddataset The notations used for the algorithm are included in

Table I Notations for Algorithm 1

Notation Description

D D= I 1 I p contains p data points of dimension DI i Image i with coordinates (x y )J Structure tensorlowast w Convolution with kernel wIx Iy First-order partial derivatives of I i

S Determinant of the structure tensorR Average saliency template Element-wise product of matrixesT i Region of interest for image I i

Dfoveated Dfoveated = T 1 T p

Table I Algorithm 1 computes for the given dataset D thecorresponding foveated dataset Dfoveated of the same size pand dimensionD the only difference is that nonsalient pixelsare set to zero in every image For each image I i in the givendatasetD the geometric invariant Si is computed as shown inline 4 of the algorithm Here Si is defined as the determinantof the structure tensor J 13 (a matrix with the locally averagedproducts of first-order partial derivatives of image I imdashinline 3) Each invariant image Si is normalized to the range[0 1] the normalized Si are then summed over all imagesand the resulting average saliency map is then transformedinto a binary saliency map R based on a threshold θ

It should be noted that during evaluation the recogni-tion rate is computed for different values of θ which resultsin differently sized regions of interest (different numbers ofsalient pixels) Based on the training set an optimal thresholdis chosen By using R we define for each image the regionof interest T i ie an image that contains the original imagevalues where R= 1 and is equal to zero elsewhere

Hierarchical Partitioning of the DatasetWe create a tree with L levels which contains the hierarchicalpartitioning of the dataset The partitioning is performed inthe following way (i) we learn a manifold of dimension NL(corresponding to levelL of the tree) by using PCAand (ii) wecluster the NL-dimensional representation of the data intok clusters using the k-means algorithm In the experimentspresented hereNL increases by 1 at each level of the tree Thestructure of the tree is presented in Figure 1 The root of thetree is defined asD11 and for Lgt 1 the nodes of the tree aredenoted asD(Lminus1)kf

Lkc with kc = 1 k being the current clusterand kf the index of the father node

Considering the representation of the tree shown inFig 1 we initialize the nodes by applying the partitioningfunction presented in Algorithm 2 The notations used inAlgorithm 2 are presented in Table II The partitioning

Figure 1 Hierarchical partitioning of the dataset D The root is definedas D11 Here L is the level of the tree k is the number of clusters kcis the current cluster and kf is the index of the father node Each node

D(Lminus1)kfLkc

of the tree is split into k clusters and contains the learned manifoldin dimension NL NL +1 etc of the corresponding cluster

function expects as input the current cluster currClusterthe number k of clusters and NL the number of principalcomponents The function computes for each node of thetree the matrix U which contains the feature vectors ofdimensionality NL (line 3 of the algorithm) the projecteddata points of the current cluster on the matrix U (line 4 ofthe algorithm) and the index which indicates to which of thek clusters the images in currCluster were clustered with thek-means algorithm (line 5 in the algorithm)We use the indexto create the subset for the kth child we select with the keepfunction in line 7 of Algorithm 2 only the images that belongto cluster j Recursively we compute U Y index and child(line 8 of the algorithm) If the current cluster does not havemore data points than the number of clusters the cluster willbe empty

Hierarchical Sensing of Unknown ScenesThe HMS algorithm uses the hierarchical partitioning ofthe dataset presented before to solve in an efficient waythe sensing problem Therefore an unknown scene ie test

Table II Notations for Algorithm 2

currCluster Current clusterL Current level of the treek Number of clusters for k -meansNL Number of principal components for each LU Matrix containing the feature vectors size (U )= (D times NL )Y Projected data points of currCluster on matrixU

size (Y )= (NL times p )index index = 1 k each Ii in currCluster belongs to cluster indexcurrCluster child Contains the children of the current cluster nodecurrCluster hp ContainsU Y and index for each node of the tree

point xtest outside D which we wish to sense is successivelyprojected on the learned tree We project xtest on the clusterwhich contains the nearest neighbor of the projected testpoint on the learned embedding Algorithm 3 presents thisprocedure The notations used are included in Table IIIThe test point is first projected on the low-dimensionalrepresentation of the rootCluster in line 5 of the algorithmIn line 6 the function find1NN searches for the nearestneighbor nearNeigh of the projected point on the learnedmanifold The classify function checks whether the test pointwas correctly classified (whether the nearest neighbor andthe test point belong to the same class) In line 8 we checkto which cluster the nearest neighbor belongs and we defineit as clusterNN In line 9 of the algorithm the currCluster isupdated and it is given by the child that corresponds to clusterclusterNN

The algorithm continues by projecting the test point onthe next level L of the tree as long as the current cluster is notempty

RESULTSWe first evaluated HMS on the Columbia Object ImageLibrary (COIL-20)14 and Amsterdam Library of ObjectImages (ALOI)15 databases The COIL-20 database contains1440 grayscale images of 20 objects with 72 images foreach object The images have 128 times 128 pixels and weretaken at object-pose intervals of 5 The ALOI database is a

Table III Notations for Algorithm 3

currCluster Current cluster which contains currCluster child and currCluster hpxtest Test data point size (xtest)= 1times Dytest Projected xtest on the low-dimensional representationnearNeigh Nearest neighbor of ytest on the low-dimensional representationrecogHMS Is 1 if xtest and nearNeigh have the same class and 0 otherwiseclusterNN Indicates to which cluster the nearNeigh belongs

color image object recognition benchmark with variation inviewing angle The original dataset contains 1000 differentobjects with 72 viewing angles for each object In ourexperiments we used 50 of all the classes at a quarterresolution 192 times 144 We worked with resized gray-levelimages of size 128times 128 pixels

In order to evaluate the presented HMS method withfoveation and adapted data partitioning we divided thedatasets into training and test data and we computed therecognition rates for the test data by assigning to each testimage the corresponding class of the nearest neighbor Foreach dataset we chose randomly one image per class and wetested them against the other images that belonged to thetraining dataset The goal of HMS is to use as few sensingactions as possible and still obtain the highest possiblerecognition rate Therefore we searched for the minimumnumber of sensing actions that HMS needs to perform inorder to achieve the highest possible recognition rate for allof the tested images

Performance of HMSWith andWithout Foveation VersusRandom ProjectionsWe explored the benefits of the presented approach bycomparing HMS with and without foveation with theclassical CS method which uses random projections ie arandom Gaussian matrix with rows that have unit lengthand a smaller number of components In order to do thiswe considered the simplest configuration for the hierarchicalpartitioning of the data k = 2 clusters and the dimensionNL for the first level of the tree equal to 1 We computedthe recognition rate for differently sized regions of interesteg 16 for a foveated dataset and up to 100 for theoriginal dataset For both HMS and CS we used the same1NN classifier

Figure 2 shows for the database ALOI with 20 classeshow the recognition rate depends on the region of interestie the number of salient pixels The curves are plottedfor three different numbers (one three and six) of sensingvalues ie for L = 1 L = 2 and L = 3 respectively ForL = 1 HMS senses with a sensing matrix of dimension(N1timesD) where N1 = 1 ie it takes only one sensing valueFor L = 2 HMS senses with a sensing matrix of (2times D)which adds up to 1 + 2 = 3 sensing values and so on Itshould be noted that for L = 1 where only one sensingvalue is available foveation deteriorates the result However

Figure 2 The HMS results for the ALOI 20 database Recognition ratesare shown for different regions of interest with one three and six sensingvalues respectively The hierarchical partitioning is performed with k = 2and NL = 1 for L = 1

already with three and six sensing values the recognitionperformance does not increase with the number of pixels thatare considered ie the performance is equally high with anROI of only 5 of the image

Figure 3 shows a selection of seven out of 28 HMSsensing basis functions with an ROI of 8 (first row) 16(second row) and without foveation (third row) It shouldbe noted that the basis functions for the two different ROIsare specific to the test image shown on the right as each newbasis function depends on the previously acquired sensingvalues Thus the basis functions evolve as we continuesensing from rather generic to more specific templates Itshould also be noted that the ROIs adapt accordingly Incomparison the third row of Fig 3 shows the correspondingselection of basis function for the case where the hierarchicalpartitioning was computed for the original dataset and notfor the foveated dataset as shown before It should be noted

that without foveation the basis functions do also adaptduring the hierarchical partitioning but the adaption is lessspecific

In Figure 4 (a)ndash(c) we present representative resultswith foveation (ROI= 16) and without foveation (ROI=100) for different benchmarks COIL with 20 classes andALOI with 20 and 40 classes We compare the results ofHMSwith the results obtained by using a randomprojectionsmatrix for sensingwith the corresponding number of sensingvalues For allmethodswe computed 100 runs andwe presentin Fig 4 the average of the recognition rate over these runsAs can be seen in Fig 4 (a)ndash(c) on the ALOI database with20 and 40 classes and the COIL-20 database we are able toreach a recognition rate of 100 with a region of interestof only 16 and six sensing values which corresponds toa compression ratio greater than 4500 in the case of ALOIand greater than 2500 for COIL The compression ratio isdefined by the ratio between the original size of the imagesof the respective dataset (number of pixels) and the numberof sensing values used for recognition It should be notedthat the recognition performance of HMS is higher than therecognition rate obtained with the CS random projectionsapproach on the considered databases

We conclude that for small databases such as COILand ALOI with 20 and 40 classes it is sufficient to usethe HMS algorithm with a simple configuration a numberof two clusters and NL = 1 for L = 1 When the databasecontains more training images eg ALOI with 50 classesit is worth studying the evolution of HMS for differenthierarchical trees focussing on the different number ofprincipal components of the first level of the tree N1 andfor a different number of clusters k We show in Figure 5the results obtained with HMS for N1 = 1 2 and 3 in thecase of k= 2 (a) and k= 3 (b) As expected HMS performsbetter with a higher number of principal components for thefirst level of the hierarchical partitioning and with a proper kconsidering the number of data points in the training dataset

Figure 3 Selected HMS sensing basis functions (seven out of 28 sensing values) with an ROI of 8 (first row) 16 (second row) for the foveated datasetwithout foveation (ROI of 100mdashthird row) and the corresponding test image from ALOI with 20 classes For the hierarchical partitioning we used k = 2clusters and NL = 1 for L = 1

Figure 4 Representative results of HMS with and without foveation versusrandom projections for different benchmarks

We also evaluated the algorithm on the highly compet-itive MNIST18 benchmark which consists of handwrittendigits from 0 to 9 There are 60000 images for training and10000 for testing

We first considered the simple configuration for par-titioning the data with NL = 1 for L = 1 and only k = 2clusters Although the overall performance of a sensing andrecognition scheme with for example L = 12 is limited toa recognition rate of 9314 it is interesting to note that ofthe 10000 test images 2491 are already correctly recognized

Figure 5 Results of HMS for different hierarchical partitionings of thetraining data from ALOI with 50 classes

with only one sensing action (L= 1) Of the remaining testimages 2436 are correctly classified with L= 2 2689 of theremaining images with L = 3 and 1290 of the remainingimages with L= 4 If this scheme is continued up to L= 12and L= 13 a total of 9850 and 9851 respectively of thetest images are correctly classified The difference between9850 and 9314 at L = 12 is due to the fact that afew images are obviously misclassified with more sensingvalues although they would have been correctly classifiedwith fewer We explored the performance of HMS on theMNIST dataset for different hierarchical partitionings of thetraining dataset ie with different values of k and N1 Asshown before for the previously considered databases therecognition rate grows with N1 We show in Figure 6 theperformance of HMS for different values of N1 in the caseof (a) k= 2 and (b) k= 3 clusters The curves are plotted fordifferent numbers of sensing values ie forL= 1L= 2 and L= 9 in Fig 6 (a) and in Fig 6 (b) for L= 1 L= 2 and L= 6 As can be seen in Fig 6 (a) for k= 2 andN1 = 20we reach a recognition rate of 9669 for L= 3 ie with 63

Figure 6 Results of HMS on MNIST for a hierarchical partitioning of thetraining data with N1 = 791420 and (a) k = 2 (b) k = 3 and usingaccumulated sensing values (Acc) over the different levels of the sensingtree

sensing values and for L= 4 we reach a higher recognitionrate of 9682

If we accumulate the sensing values over the differentlevels of the sensing tree the recognition rate improves asshown in Fig 6 With k = 2 and N1 = 20 HMS reaches arecognition rate of 9688with 63 sensing values and 9693with 86 sensing values

In Figure 7 we show the results for HMS compared withCS The parameters for HMS are N1 = 20 and k = 2 andwe computed only 10 runs It should be noted that HMSoutperforms CS The region of interest of 23 seems not tocontain sufficient salient pixels in order to reach a higherperformance than HMS without foveation as in the case ofthe other tested databases

Figure 8 shows a selection of five sensing basis functionsfor MNIST obtained with a hierarchical partitioning withk= 2 andN1 = 9 In the first column sample test imageswiththe corresponding number and class are shown and on each

Figure 7 Results on MNIST for HMS versus random projections

line the corresponding sensing basis functions for a differentnumber of sensing values ie for L= 1 L= 3 L= 7 L= 8and L= 9 One can note the evolution from rather generic tomore specific templates

Performance of HMS Versus MS and FMSIn Ref 7 it has been shown that for the ALOI database with20 classes MS needs 38 sensing values in order to reach100 recognition rate In Ref 8 FMS reaches only 65recognition rate with 15 sensing values Here we showedthat for a 100 recognition rate only six sensing values areneeded for HMS with foveation and ten sensing values forHMS without foveation Both MS and FMS strongly dependon the number of neighbors selected for the Locally LinearEmbedding used to learn the manifolds on the decreasingsize of the adaptive dataset and on the dimension of themanifolds at each iteration of the algorithm

The important difference between HMS andMSFMS isthat with HMS the partitioning of the dataset is performedprior to sensing As a consequence finding the optimalparameters for the partitioning is more difficult for MS andFMS

Performance of HMS Versus Other MethodsIn 2014 Dornaika et al16 developed a semi-supervisedfeature extractionwith an out-of-sample extension algorithmwhich they applied on a subset of the COIL-20 (18 imagesfrom 72 available for each object) database They randomlyselected 50 of the data as the training dataset and therest as the test dataset From the training dataset theyrandomly labeled one two and three samples per classand the rest of the data were used as unlabeled data Thedata are first preprocessed PCA is computed in order topreserve 98 of the energy of the dataset The work Ref 16provides a comparison between methods that are basedon label propagation and on graph-based semi-supervisedembedding They report the best average classification resultson ten random splits for their method for three label samplesand for unlabeled (804) and test data (774) They also

Figure 8 Selected HMS sensing basis functions without foveation (second to fifth columnmdashfor L = 13789) and the corresponding sample test image(first column) from the MNIST dataset For the hierarchical partitioning we used k = 2 clusters and NL = 9 for L = 1

show that when one labeled sample per class is used theirmethod reaches 61 recognition rate with around 19 featuredimensions In order to compare HMS with the approachproposed in Ref 16 we divided the COIL dataset with 72objects per class into training and test datasets in a similarway to that described before Although the training datasetconsists only of 720 images HMS performs better thanthe semi-supervised feature extraction algorithm in Ref 16Thus for a hierarchical partitioning of the training data with

k= 2 and N1 = 1 HMS reaches an average recognition rate(over ten random splits of the data) of 9498with 15 sensingvalues If the partitioning is performedwith the samenumberof clusters but N1 = 2 a higher recognition rate of 9598 atL= 5 is reached with 20 sensing values

A recent article17 presents an out-of-sample gener-alization algorithm for supervised manifold learning forclassification which is evaluated on the COIL-20 datasetThe authors use 71 images for each of the 20 objects in

COIL which they normalize convert to grayscale anddownsample to a resolution of 32times 32 pixels The algorithmembeds the images in a 25-dimensional space They obtaina minimum average misclassification rate over five runs ofapproximately 2 We compared HMS with the approach inRef 17 andwe obtained an averagemisclassification ratewithten sensing values of 42 and 100 recognition rate with15 sensing values For the hierarchical partitioning we usedk= 2 andN1 = 1 Thus HMS reaches 100 recognition ratewith even fewer sensing values than in Ref 17

On the MNIST database a baseline method18 whichuses as input to a second-degree polynomial classifier the40-dimensional feature vector of the training data obtainedwith PCA has an error rate of 33 compared with our 332with 41 sensing values and 312 with 63 sensing values asshown in Fig 6 (a)

State-of-the-art performance on MNIST has beenachieved by a recurrent convolutional neural network(RCNN) approach which introduces recurrent connectionsinto each convolutional layer19 The approach reaches atesting error of 031 and uses 670000 parameters Asargued in the following section our goal is to explore theHMS algorithm in terms of the best recognition rate reachedwith as few sensing actions as possible rather than increasingcomplexity for maximum performance

CONCLUSIONS AND FUTUREWORKWe have presented a novel algorithm which aims at efficientsensing and have evaluated the efficiency in terms ofthe resulting recognition performance We assume theavailability of a dataset of images that represent the type ofobjects and scenes that need to be sensed and recognizedBased on these data the goal is to learn a sensing strategysuch that recognition is possible with few sensing valuesAlthough we use a very simple nearest-neighbor classifieron easy benchmarks such as COIL and ALOI with onlysome classes perfect recognition is possible with onlyabout ten sensing values On harder benchmarks such asMNIST state-of-the-art performance could not be reachedbut we could show that a large number of test imagescould be recognized with only very few sensing valuesSuch performance resembles human performance sincehumans can effortlessly recognize a multitude of objectsbased on just the gist of a scene and require scrutiny forless familiar objects and more difficult recognition tasks Afurther bio-inspired element of our algorithm is foveationand we have shown that gist-like sensing and recognitionrequires the whole image whereas more refined sensing canbe reduced to only few salient locations without deterioratingrecognition performance It should be noted however thatwe do not use the saliency of the actual image but only theaverage saliency of all images in the hierarchical datasetIn terms of compressive sensing we have here proposed asensing scheme with a learned sensing matrix and we haveshown that it leads to better recognition performance thanrandom sensing In the case of foveation the sensing matrixis also sparse in addition to being learned ie adapted to thespecific dataset

A weakness of the proposed approach is that it offers arather high number of choices For example it is not obviousin which dimension to start sensing and how to increasethe dimension of the embedding manifolds In future workhowever this weakness could be turned into a benefit byexploring different strategies A further weakness is thatthe method like many others will be less efficient if manydifferent objects need to be recognized in the same sceneThis scenario would most likely require a preliminary stageof object detection and rough object segmentation

It should be noted that we are here addressing aproblem that is not typically addressed in current computer-vision challenges but is becoming increasingly relevantas computer-vision systems are becoming more pervasiveWhile currently the focus is on maximizing recognitionperformance an equally challenging problem consists offinding the simplest solution for a given problem Simplicitycan be defined in different ways and we here adopt theapproach of using a minimum number of sensing values anda simple classifier This reduces both the required bandwidthof the sensor and the required processing power

ACKNOWLEDGMENTSThis research is funded by the Graduate School for Com-puting in Medicine and Life Sciences funded by GermanyrsquosExcellence Initiative [DFG GSC 2351]

REFERENCES1 E Candegraves and M Wakin lsquolsquoIntroduction to compressive samplingrsquorsquo IEEESignal Process Mag 25 21ndash30 (2008)

2 B Olshausen and D Field lsquolsquoNatural image statistics and efficient codingrsquorsquoNetw Comput Neural Syst 7 333ndash339 (1996)

3 D Donoho lsquolsquoCompressed sensingrsquorsquo IEEE Trans Inf Theory 52 1289ndash1306 (2006)

4 H Schuumltze E Barth and T Martinetz lsquolsquoAn adaptive hierarchical sensingscheme for sparse signalsrsquorsquo Proc SPIE 9014 151ndash8 (2014)

5 R Baraniuk and M Wakin lsquolsquoRandom projections of smooth manifoldsrsquorsquoFound Comput Math 9 51ndash77 (2009)

6 M Chen J Silva J Paisley C Wang D Dunson and L Carin lsquolsquoCom-pressive sensing on manifolds using a nonparametric mixture of factoranalyzers algorithm and performance boundsrsquorsquo IEEE Trans SignalProcess 58 6140ndash6155 (2010)

7 I Burciu A Ion-Margineanu T Martinetz and E Barth lsquolsquoVisual mani-fold sensingrsquorsquo Proc SPIE 9014 481ndash8 (2014)

8 I Burciu T Martinetz and E Barth lsquolsquoFoveated manifold sensing forobject recognitionrsquorsquo Proc IEEE Black Sea Conf on Communications andNetworking (2015) pp 196ndash200

9 S Roweis and L Saul lsquolsquoNonlinear dimensionality reduction by locallylinear embeddingrsquorsquo Science 290 2323ndash2326 (2000)

10 N Tajunisha and V Saravanan lsquolsquoAn efficient method to improvethe clustering performance for high dimensional data by PrincipalComponent Analysis and modified K-meansrsquorsquo International Journal ofDatabase Management Systems 3 (2011)

11 D Arthur and S Vassilvitskii lsquolsquoK-means++ the advantages of carefulseedingrsquorsquo SODA rsquo07 Proc Eighteenth Annual ACM-SIAM symposium onDiscrete Algorithms (2007) pp 1027ndash1035

12 C Zetzsche K Schill H Deubel G Krieger E Umkehrer andS Beinlich lsquolsquoInvestigation of a sensorimotor system for saccadic sceneanalysis an integrated approachrsquorsquo From Animals to Animats 5 Proc FifthInt Conf on Simulation of Adaptive Behavior edited by R Pfeifer et al(MIT Press 1998) Vol 5 pp 120ndash126

13 B Jaumlhne HHauszligecker and P GeiszliglerHandbook of Computer Vision andApplications (Academic Press 1999) Vol 2

14 S A Nene S K Nayar and H Murase Columbia Object Image Library(COIL-20) Technical Report CUCS-005-96 (1996)

15 J M Geusebroek G J Burghouts and A W M Smeulders lsquolsquoThe Ams-terdamLibrary ofObject Imagesrsquorsquo Int J Comput Vis 61 103ndash112 (2005)

16 F Dornaika Y El Traboulsi B Cases and A Assoum lsquolsquoImageclassification via semi-supervised feature extraction with out-of-sampleextensionrsquorsquo Advances in Visual Computing ISCV Part I (2014) Vol 8887pp 182ndash192

17 E Vural and C Guillemot Out-of-sample generalizations for supervisedmanifold learning for classification httparxivorgabs150202410[csCV] (2015)

18 Y LeCun L Bottou Y Bengio and P Haffner lsquolsquoGradient-based learningapplied to document recognitionrsquorsquo Proc IEEE 86 2278ndash2324 (1998)

19 M Liang and X Hu lsquolsquoRecurrent convolutional neural network for objectrecognitionrsquorsquo IEEE Conf on Computer Vision and Pattern Recognition(CVPR) (IEEE Piscataway NJ 2015)

All in-text references underlined in blue are linked to publications on ResearchGate letting you access and read them immediatelyAll in-text references underlined in blue are linked to publications on ResearchGate letting you access and read them immediately

T1
F1
T2
T3
F3
F2
F4
F5
F6
F7
F8
B1
B2
B3
B4
B5
B6
B7
B8
B9
B10
B11
B12
B13
B14
B15
B16
B17
B18
B19

Page 2: Hierarchical Manifold Sensing with Foveation and Adaptive ...webmail.inb.uni-luebeck.de/inb-publications/pdfs/BuMaBa16.pdf · Baraniuk presented a theoretical analysis of CS for manifolds