+ All Categories
Home > Documents > Masking Strategies for Image Manifoldsmduarte/images/MSIM2014.pdfMasking Strategies for Image...

Masking Strategies for Image Manifoldsmduarte/images/MSIM2014.pdfMasking Strategies for Image...

Date post: 23-Sep-2020
Category:
Upload: others
View: 7 times
Download: 0 times
Share this document with a friend
13
1 Masking Strategies for Image Manifolds Hamid Dadkhahi and Marco F. Duarte, Senior Member, IEEE Abstract—We consider the problem of selecting an optimal mask for an image manifold, i.e., choosing a subset of the dimensions of the image that preserves the manifold’s geometric structure present in the original data. Such masking implements a form of compressive sensing that reduces power consumption in emerging imaging sensor platforms. Our goal is for the manifold learned from masked images to resemble its full image counterpart as closely as possible. We consider both global (Isomap) and local (LLE) manifold learning methods. In each case, the process of finding the optimal masking pattern can be cast as a binary integer program, which is computationally expensive but can be approximated by a fast greedy algorithm. For Isomap, the algorithm provides the lowest distortion between the norms of the masked secants (differences between an image and its neighbors) and their expected value. For LLE, the algorithm preserves the norms of these secants and their cliques (which include differences between pairs of neighbors) up to a scaling factor. Numerical experiments show that the manifold structure is preserved through the data-dependent masking process, even for modest mask sizes. Index Terms—Manifold learning, dimensionality reduction, linear embedding, image masking, compressive sensing I. I NTRODUCTION R ECENT advances in sensing technology have en- abled a massive increase in the dimensionality of data captured from digital sensing systems. Naturally, the high dimensionality of the data affects various stages of the digital systems, from data acquisition to processing and analysis. To meet communication, computation, and storage constraints, in many applications one seeks a low- dimensional embedding of the high-dimensional data that shrinks the size of the data representation while retaining the information we are interested in capturing. This prob- lem of dimensionality reduction has attracted significant attention in the signal processing and machine learning communities. The traditional method for dimensionality reduction is principal component analysis (PCA) [3], which success- fully captures the structure of datasets well approximated by a linear subspace. However, in many parameter es- timation problems, the data can be best modeled by a nonlinear manifold whose geometry cannot be captured by PCA. Manifolds are low-dimensional geometric structures that reside in a high-dimensional ambient space despite possessing merely a few degrees of freedom. Manifold This work was supported by NSF Grant IIS-1239341. Portions of this work appeared at the IEEE Statistical Signal Processing Workshop (SSP), 2014 [1] and the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2015 [2] . H. Dadkhahi and M. F. Duarte are with the Department of Electrical and Computer Engineering, University of Massachusetts, Amherst, MA, 01003. E-mail: {hdadkhahi,mduarte}@ecs.umass.edu. models are a good match for datasets associated with a physical system or event governed by a few continuous- valued parameters. Once the manifold model is formulated, any point on the manifold can be essentially represented by a low-dimensional parameter vector. Manifold learning methods aim to obtain a suitable nonlinear embedding into a low-dimensional space that preserves the geometric structure present in the higher-dimensional data. In general, nonlinear dimensionality reduction techniques can be sub- divided into two main categories: (i) techniques that attempt to preserve global properties of the original data in the low- dimensional representation (e.g., Isomap [4] and diffusion maps [5]), and (ii) techniques that attempt to preserve local properties of the original data in the low-dimensional representation (e.g., Locally Linear Embedding (LLE) [6], Laplacian eigenmaps [7], and Hessian eigenmaps [8]). For high-dimensional data, the process of data acqui- sition followed by a dimensionality reduction method is inherently wasteful, since we are often not interested in ob- taining the full-length representation of the data. This issue has been addressed by compressive sensing, a technique to simultaneously acquire and reduce the dimensionality of sparse signals in a randomized fashion [9], [10]. As an extension of compressive sensing, the use of random projections for linear embedding of nonlinear manifold datasets has been proposed [11]–[15], where the high- dimensional data is mapped to a random subspace of lower (but sufficiently high) dimensionality. As a result, the pairwise distances between data points are preserved with high probability. Recently, random projections have been outperformed by a new data-dependent linear em- bedding obtained via optimization [16]. One can formulate a semidefinite program to construct a deterministic linear embedding that preserves the pairwise distances between all data points up to a desired distortion parameter. Compressive sensing provides a good match to the re- quirements of cyber-physical systems, where power con- straints are paramount. In such applications, one wishes to reduce the size of the representation of the data to be processed, often by applying standard compression algo- rithms. For instance, a fundamental challenge in the design of computational eyeglasses for gaze tracking is address- ing stringent resource constraints on data acquisition and processing that include sensing fidelity and energy budget, in order to meet lifetime and size design targets [17]. A recent example implementation uses an imaging sen- sor architecture that can significantly reduce the power consumption of sensing by allowing pixel-level control of the image acquisition process [18]; the power consumption of imaging becomes proportional to the number of pixels to be acquired using the array. Thus, it is now possible
Transcript
Page 1: Masking Strategies for Image Manifoldsmduarte/images/MSIM2014.pdfMasking Strategies for Image Manifolds Hamid Dadkhahi and Marco F. Duarte, Senior Member, IEEE Abstract—We consider

1

Masking Strategies for Image ManifoldsHamid Dadkhahi and Marco F. Duarte, Senior Member, IEEE

Abstract—We consider the problem of selecting an optimalmask for an image manifold, i.e., choosing a subset ofthe dimensions of the image that preserves the manifold’sgeometric structure present in the original data. Such maskingimplements a form of compressive sensing that reduces powerconsumption in emerging imaging sensor platforms. Our goalis for the manifold learned from masked images to resemble itsfull image counterpart as closely as possible. We consider bothglobal (Isomap) and local (LLE) manifold learning methods.In each case, the process of finding the optimal maskingpattern can be cast as a binary integer program, which iscomputationally expensive but can be approximated by a fastgreedy algorithm. For Isomap, the algorithm provides thelowest distortion between the norms of the masked secants(differences between an image and its neighbors) and theirexpected value. For LLE, the algorithm preserves the normsof these secants and their cliques (which include differencesbetween pairs of neighbors) up to a scaling factor. Numericalexperiments show that the manifold structure is preservedthrough the data-dependent masking process, even for modestmask sizes.

Index Terms—Manifold learning, dimensionality reduction,linear embedding, image masking, compressive sensing

I. INTRODUCTION

RECENT advances in sensing technology have en-abled a massive increase in the dimensionality of

data captured from digital sensing systems. Naturally, thehigh dimensionality of the data affects various stages ofthe digital systems, from data acquisition to processingand analysis. To meet communication, computation, andstorage constraints, in many applications one seeks a low-dimensional embedding of the high-dimensional data thatshrinks the size of the data representation while retainingthe information we are interested in capturing. This prob-lem of dimensionality reduction has attracted significantattention in the signal processing and machine learningcommunities.

The traditional method for dimensionality reduction isprincipal component analysis (PCA) [3], which success-fully captures the structure of datasets well approximatedby a linear subspace. However, in many parameter es-timation problems, the data can be best modeled by anonlinear manifold whose geometry cannot be captured byPCA. Manifolds are low-dimensional geometric structuresthat reside in a high-dimensional ambient space despitepossessing merely a few degrees of freedom. Manifold

This work was supported by NSF Grant IIS-1239341. Portions of thiswork appeared at the IEEE Statistical Signal Processing Workshop (SSP),2014 [1] and the IEEE International Conference on Acoustics, Speech,and Signal Processing (ICASSP), 2015 [2] .

H. Dadkhahi and M. F. Duarte are with the Department of Electricaland Computer Engineering, University of Massachusetts, Amherst, MA,01003. E-mail: hdadkhahi,[email protected].

models are a good match for datasets associated with aphysical system or event governed by a few continuous-valued parameters. Once the manifold model is formulated,any point on the manifold can be essentially representedby a low-dimensional parameter vector. Manifold learningmethods aim to obtain a suitable nonlinear embeddinginto a low-dimensional space that preserves the geometricstructure present in the higher-dimensional data. In general,nonlinear dimensionality reduction techniques can be sub-divided into two main categories: (i) techniques that attemptto preserve global properties of the original data in the low-dimensional representation (e.g., Isomap [4] and diffusionmaps [5]), and (ii) techniques that attempt to preservelocal properties of the original data in the low-dimensionalrepresentation (e.g., Locally Linear Embedding (LLE) [6],Laplacian eigenmaps [7], and Hessian eigenmaps [8]).

For high-dimensional data, the process of data acqui-sition followed by a dimensionality reduction method isinherently wasteful, since we are often not interested in ob-taining the full-length representation of the data. This issuehas been addressed by compressive sensing, a techniqueto simultaneously acquire and reduce the dimensionalityof sparse signals in a randomized fashion [9], [10]. Asan extension of compressive sensing, the use of randomprojections for linear embedding of nonlinear manifolddatasets has been proposed [11]–[15], where the high-dimensional data is mapped to a random subspace oflower (but sufficiently high) dimensionality. As a result,the pairwise distances between data points are preservedwith high probability. Recently, random projections havebeen outperformed by a new data-dependent linear em-bedding obtained via optimization [16]. One can formulatea semidefinite program to construct a deterministic linearembedding that preserves the pairwise distances betweenall data points up to a desired distortion parameter.

Compressive sensing provides a good match to the re-quirements of cyber-physical systems, where power con-straints are paramount. In such applications, one wishesto reduce the size of the representation of the data to beprocessed, often by applying standard compression algo-rithms. For instance, a fundamental challenge in the designof computational eyeglasses for gaze tracking is address-ing stringent resource constraints on data acquisition andprocessing that include sensing fidelity and energy budget,in order to meet lifetime and size design targets [17].A recent example implementation uses an imaging sen-sor architecture that can significantly reduce the powerconsumption of sensing by allowing pixel-level control ofthe image acquisition process [18]; the power consumptionof imaging becomes proportional to the number of pixelsto be acquired using the array. Thus, it is now possible

Page 2: Masking Strategies for Image Manifoldsmduarte/images/MSIM2014.pdfMasking Strategies for Image Manifolds Hamid Dadkhahi and Marco F. Duarte, Senior Member, IEEE Abstract—We consider

2

to meet stringent power and communication requirementsby designing data-dependent image masking schemes thatreduce the number of pixels involved in acquisition while,like the aforementioned linear embeddings, preserving theinformation of interest. The selection of a masking patternis ideally driven by knowledge of the data model thatcaptures the relevant information in the data, such as anonlinear manifold model for images controlled by a fewdegrees of freedom.

Prior work in the area of compressive imaging has con-sidered the design of linear embeddings that allow for dataprocessing directly from lower-dimensional representation,with a particular emphasis in imaging [11]–[13], [16],[19]. However, while the aforementioned embeddings mayreduce the computational and communication demands,they do not reduce the power consumption burden ofdata acquisition. This is because they require all imagepixels to be sensed, and so they cannot be implementedmore efficiently than standard acquisition. Thus, in orderto incorporate the aforementioned new architectures intocompressive imaging and enable the promised savings inpower, we need to devise new mask selection approachesgoverned by the same principle of preservation of relevantimage data as existing work in embedding design.

In this paper, we consider the problem of designingmasking patterns that preserve the geometric structure of ahigh-dimensional dataset modeled as a nonlinear manifold.The preservation of this structure through the masking isrelevant to preserve the performance of manifold learning.Note that in terms of linear embeddings, masking schemesmay be described as a restriction to embeddings where theprojection directions are required to correspond to canonicalvectors. Previous work on linear dimensionality reductionfor manifolds does not address the highly constrained(masking) setting that is motivated by our application. Weconsider Isomap from the global category and LLE from thelocal one, to show that masking algorithms are applicableto both categories.

The application of our proposed scheme to compressivesensing of images proceeds as follows. We start with aset of full-length training data, which can be collectedat an initialization stage when power resources are notconstrained. We then derive a masking pattern using theproposed algorithms at the computational platform (likelyaway from the sensor), and program the sensor to acquireonly the pixels contained in the mask for subsequentcaptures in order to reduce the power consumption undernormal operation. The cost of data acquisition (which interms of power consumption is proportional to the numberof pixels/data dimensions with the current hardware) is themain motivation for our framework, rather than the cost ofcomputation for training or the cost of manifold learning.As in most examples where compressive sensing is appli-cable, the goal here is to trade off simple compression atthe sensor (in order to reduce the cost of acquisition) byadditional computation that can be incurred outside of thesensor.

This paper is organized as follows. After briefly review-

ing the relevant literature in Section II, we propose inSection III both optimization problems and greedy algo-rithms that select a masking pattern as a subset of thedimensions in the high-dimensional space containing theoriginal dataset, with the general goal being to preservethe structure of the dataset that is relevant during manifoldlearning. In Section IV, we evaluate the proposed algo-rithms over several manifold-modeled datasets, includingeye gaze tracking images representative of the computa-tional eyeglasses application. The proposed masking pat-terns can lead to significant savings in energy consumptionof the sensing devices, while incurring minimal loss inthe performance of manifold learning. We offer discussionsand some directions for future work in Section V. Finally,concluding remarks are given in Section VI.

II. BACKGROUND

A. Manifold Models and Linear Embeddings

A set of data points X = x1, x2, . . . , xn in a high-dimensional ambient space Rd that have been generated byan `-dimensional parameter correspond to a sampling of amanifoldM⊂ Rd. Given the high-dimensional data set X ,we would like to find the parameterization that has gener-ated the manifold. One way to discover this parametrizationis to embed the high-dimensional data on the manifold toa low-dimensional space Rm so that the geometry of themanifold is preserved. Dimensionality reduction methodsare devised so as to preserve such geometry, which ismeasured by a neighborhood-preserving criteria that variesdepending on the specific algorithm.

A linear embedding is defined as a linear mappingΦ ∈ Rm×d that embeds the data in the ambient space Rdinto a low-dimensional space Rm. In many applications,linear embeddings are desirable as dimensionality reductionmethods due to their computational efficiency and gener-alizability. The latter attribute renders linear embeddingseasily applicable to unseen test data points. Principal com-ponent analysis (PCA) is perhaps the most popular schemefor linear dimensionality reduction of high-dimensionaldata [3]. PCA is defined as the orthogonal projection ofthe data onto a linear subspace of lower dimension msuch that the variance of the projected data is maximized.The projection vectors φimi=1 are found by solving thesequential problems

φi = arg maxφi:‖φi‖2=1

n∑`=1

(φTi x` − φTi x

)2(1)

subject to φi ⊥ φj ∀ j < i

where x = 1n

∑ni=1 xi represents the mean of the

data, and ⊥ designates orthogonality. Note that Φ =[φ1 φ2 . . . φm]T . Conveniently, the solutions to (1) arethe sequence of the dominant eigenvectors of the datacovariance matrix [3].

B. Nonlinear Manifolds and Manifold Learning

Unfortunately, PCA fails to preserve the geometric struc-ture of a nonlinear manifold, i.e., a manifold where the

Page 3: Masking Strategies for Image Manifoldsmduarte/images/MSIM2014.pdfMasking Strategies for Image Manifolds Hamid Dadkhahi and Marco F. Duarte, Senior Member, IEEE Abstract—We consider

3

mapping from parameters to data is nonlinear. Particularly,since PCA arbitrarily distorts individual pairwise distances,it can significantly change the local geometry of the mani-fold. Fortunately, several nonlinear manifold learning meth-ods can successfully embed the data into a low-dimensionalmodel while preserving such local geometry in order tosimplify the parameter estimation process.

1) Isomap: The Isomap method aims to preserve thepairwise geodesic distances between data points [4]. Thegeodesic distance is defined as the length of the shortestpath between two data points xi and xj (xi, xj ∈ M)along the surface of the manifold M and is denoted bydG(xi, xj). Isomap first finds an approximation to thegeodesic distances between each pair of data points byconstructing a neighborhood graph in which each pointis connected only to its k nearest neighbors; the edgeweights are equal to the corresponding pairwise distances.For neighboring pairs of data points, the Euclidean distanceprovides a good approximation for the geodesic distance,i.e., dG(xi, xj) ≈ ‖xi − xj‖2 for xj ∈ Nk(xi), whereNk(xi) designates the set of k nearest neighbors to thepoint xi ∈ X . For non-neighboring points, the length ofthe shortest path along the neighborhood graph is usedto estimate the geodesic distance. Then, multidimensionalscaling (MDS) [20] is applied to the resulting geodesicdistance matrix to find a set of low-dimensional points thatbest match such distances. Note that Isomap is a globalmethod, since the manifold structure is defined by geodesicdistances that depend on distances between data pointsthroughout the manifold.

2) Locally Linear Embedding: As an alternative, thelocally linear embedding (LLE) method retains the geo-metric structure of the manifold as captured by locallylinear fits [4]. More precisely, LLE computes coefficientsof the best approximation to each data point by a weightedlinear combination of its k nearest neighbors. The weightsW = [wij ] are found such that the squared Euclideanapproximation error is minimized:

W = arg minW

n∑i=1

∥∥∥∥∥xi − ∑j:xj∈Nk(xi)

wijxj

∥∥∥∥∥2

2

(2)

subject to∑

j:xj∈Nk(xi)

wij = 1, i = 1, . . . , n.

LLE then finds a set of points in an m-dimensional spacethat minimizes the error of the local approximations givenby the weights W . More precisely, LLE finds the setY = y1, y2, . . . , yn ⊂ Rm that minimizes the squaredEuclidean error function

Y = arg minyi

n∑i=1

∥∥∥∥∥yi − ∑j:xj∈Nk(xi)

wij yj

∥∥∥∥∥2

2

(3)

subject ton∑i=1

yi = 0,1

n

n∑i=1

yiyTi = I,

where the first and second constraints are to remove thedegrees of freedom due to translation and scaling of the

coordinates, in order to obtain a unique solution for theembedding. Note that LLE is considered as a local method,since the manifold structure at each point is determinedonly by neighboring data points.

C. Linear Dimensionality Reduction for Nonlinear Mani-folds

An alternative linear embedding approach to PCA is themethod of random projections, where the entries of thelinear dimensionality reduction matrix are drawn indepen-dently following a standard probability distribution suchas normal Gaussian or Rademacher. It has been shownthat such random projections preserve the relevant pairwisedistances between data points with high probability [11]–[15], so that manifold learning algorithms can be applied onthe dimensionality reduced data with very small distortion.The drawbacks of random projections are two-fold: (i) theirtheoretical guarantees are asymptotic and probabilistic, and(ii) random embeddings are independent of the geometricstructure of data, and thus cannot take advantage of trainingdata.

Recently, a near-isometric linear embedding method ob-tained via convex optimization (referred to as NuMax) hasbeen proposed [16], [21]. The key concept in NuMax isto obtain an isometry on the set of pairwise data pointdifferences, dubbed secants, after being normalized to lieon the unit sphere:

S =

xi − xj‖xi − xj‖2

: xi, xj ∈M.

NuMax relies on a convex optimization problem that findsan embedding Φ with minimum dimension such that thesecants are preserved up to a norm distortion parameter δ.More precisely, the search for a linear embedding is castas the following rank-minimization problem:

P ∗ = arg min rank(P ) (4)

subject to |sTPs− 1| ≤ δ ∀ s ∈ S, P 0.

After P ∗ is obtained, one can factorize P ∗ = ΦTΦ inorder to obtain the desired low-dimensional embeddingΦ. We note that the rank of the solution determines thedimensionality of the embedding, and is controlled by thechoice of the distortion parameter δ ∈ [0, 1]. Note alsothat sTPs = ‖Φs‖22; thus, the first constraint essentiallyupper-bounds the distortion incurred by each secant s ∈ S.The problem (4) is NP-hard, but one may instead solve itsnuclear norm relaxation, where the rank of P is replaced byits nuclear norm ‖P‖∗. Since P is a positive semidefinitesymmetric matrix, its nuclear norm amounts to its trace, andthus the optimization in (4) is equivalent to a semidefiniteprogram and can be solved in polynomial time.

D. Connection with Feature Selection

The problem of image masking design is reminiscent offeature selection in supervised and unsupervised learning[22], [23]. Previous work on feature selection for unsu-pervised learning problems (such as manifold learning) is

Page 4: Masking Strategies for Image Manifoldsmduarte/images/MSIM2014.pdfMasking Strategies for Image Manifolds Hamid Dadkhahi and Marco F. Duarte, Senior Member, IEEE Abstract—We consider

4

mostly focused on clustering [24]. Spectral feature selec-tion (SPEC) is an unsupervised feature selection methodbased on spectral graph theory [25]. In SPEC, a pairwiseinstance similarity metric is used in order to select featuresthat are most consistent with the innate structure of thedata. In particular, the radial basis function (RBF) kernel,given by exp(−‖xi−xj‖2

2σ2 ), is used to measure pairwisesimilarity between data points. An undirected graph isthen constructed with data points as vertices and pairwisesimilarities as edge weights. According to spectral graphtheory, the features are selected so as to preserve thespectrum of the resulting Laplacian matrix. Note that theLaplacian score, proposed earlier in [26], is a special case ofSPEC. Similarity preserving feature selection (SPFS) fur-ther extends SPEC by overcoming its limitation on handlingredundant features [27]. In other words, SPFS considersboth similarity preservation and correlation among featuresin order to avoid choosing redundant features.

III. MANIFOLD MASKING

In this section, we adopt the criteria used in linearand nonlinear embedding algorithms from Section II todevelop algorithms that obtain structure-preserving maskingpatterns for manifold-modeled data. To unify notation, weare seeking a masking index set Ω = ω1, . . . , ωm ofcardinality m that is a subset of the dimensions [d] :=1, 2, . . . , d of the high-dimensional space containing theoriginal dataset.

A. Principal Coordinate Analysis

A natural adaptation of PCA to mask design is to find them canonical basis vectors (rather than arbitrary orthogonalvectors in PCA) that span the canonical subspace whichcaptures the highest variance of the data through projection.We call the resulting approach principal coordinate analysis(PCoA), which works as follows. Substituting φi withcanonical basis elements ei in (1) yields

ωi = arg maxi∈[d]

n∑`=1

(x`(i)− x(i))2 (5)

subject to ωi 6= ωj ∀ j < i,

and so the masking pattern Ω is found by solving (5)sequentially for i = 1, . . . ,m. In practice, this maskingpattern can be obtained greedily by selecting the indicesof the m dimensions with the highest variances across thedataset.

In the sequel, we design algorithms tailored to nonlinearmanifold learning methods by preserving the metrics of themanifold relevant to the particular method.

B. Isomap-Aware Mask Selection

Inspired by the optimization approach of NuMax and theneighborhood-preservation notion of Isomap, we formulatea method for manifold masking that aims at minimizingthe distortion incurred by pairwise distances of neighboringdata points.

Recall that Isomap attempts to preserve the geodesicdistances rather than Euclidean distances of data points.Since only the Euclidean distances of neighboring datapoints match their geodesic counterparts (and the geodesicdistance between any two points is found as a function ofthe geodesic distances between the neighboring points), weare interested in devising a masking operator than preservesthe pairwise distances of each data point with its k nearestneighbors. This gives rise to the reduced secant set

Sk =

xi − xj‖xi − xj‖2

: i ∈ [n], xj ∈ Nk(xi)

⊆ S.

To simplify notation, we define the masking linear op-erator Ψ : xi 7→ xi(j)j∈Ω corresponding to the maskingindex set Ω. We also denote the column vectors ai withentries ai(j) = s2

i (j) for all j ∈ [d] and for each i ∈ [|Sk|].Since the secants are normalized, we have

∑dj=1 ai(j) = 1

for all i ∈ [|Sk|].Since a masking operator cannot preserve the norm of

the secants, we study the behavior of the masked secantnorm under a uniform distribution for the masks Ω. Takingexpectation of the secant norms after masking over therandom variable Ω yields

E[‖Ψsi‖22] = E

[∑j∈Ω

ai(j)

](a)=

∑Ω:|Ω|=m

P(Ω)

(∑j∈Ω

ai(j)

)(b)=

∑Ω:|Ω|=m

1(dm

) ∑j∈Ω

ai(j)

=1(dm

) ∑Ω:|Ω|=m

∑j∈Ω

ai(j)

(c)=

1(dm

)( dm

)m

d

d∑j=1

ai(j)(d)=

m

d,

where (a) is by the definition of expectation, (b) is due tothe masks being equiprobable, (c) is due to the fact thateach term ai(j) appears exactly

(d−1m−1

)=(dm

)md times in

the double summation since the number of m-subsets ofthe set [d] that include a particular element is

(d−1m−1

), and

(d) is due to the fact that the secants are normalized.Thus, the norms of the secants si ∈ Sk are inevitably

subject to a compaction factor of√

md in expectation by

the masking operator Ψ; this behavior bears out empiricallywhen random masks are used for the datasets consideredin Section IV. As a result, we will aim to find a maskingoperator Ψ such that for all si ∈ Sk we obtain ‖Ψsi‖22 ≈md . Note that ‖Ψsi‖22 =

∑j∈Ω s

2i (j) =

∑dj=1 s

2i (j)z(j) =

aTi z, where the indicator vector z is defined by

z(j) =

1 if j ∈ Ω,

0 otherwise.(6)

In words, the vector z ∈ 0, 1d encodes the membership ofthe masking index set Ω ⊆ [d]. The average and maximumdistortion of the secant norms caused by the masking can beexpressed in terms of the vector z and the squared secants

Page 5: Masking Strategies for Image Manifoldsmduarte/images/MSIM2014.pdfMasking Strategies for Image Manifolds Hamid Dadkhahi and Marco F. Duarte, Senior Member, IEEE Abstract—We consider

5

Algorithm 1 Manifold-Aware Pixel Selection for Isomap (MAPS-Isomap)Inputs: normalized squared secants matrix A, number of dimensions mOutputs: masking index set ΩInitialize: Ω← for i = 1→ m doAΩ ← AΩ · 1|Ω| compute current masked secant squared normsωi ← arg minω∈Ωc ‖Aω + AΩ − i

d1|Sk|‖p minimize aggregate difference with E[‖Ψsi‖22]Ω← Ω ∪ ωi add selected dimension to the masking index set

end for

matrix A := [a1 a2 . . . a|Sk|]T as follows:∑

si∈Sk

∣∣∣‖Ψsi‖22 − m

d

∣∣∣ =∥∥∥Az − m

d1|Sk|

∥∥∥1,

maxsi∈Sk

∣∣∣‖Ψsi‖22 − m

d

∣∣∣ =∥∥∥Az − m

d1|Sk|

∥∥∥∞,

respectively, where 1|Sk| denotes the |Sk|-dimensional all-ones column vector. Thus, we find the optimal maskingpattern by casting the following integer program:

z∗ = arg minz

∥∥∥Az − m

d1|Sk|

∥∥∥p

(7)

subject to 1Td z = m, z ∈ 0, 1d,

where p = 1 and p = ∞ correspond to optimizing theaverage and maximum secant norm distortion caused bythe masking, respectively.1 The equality constraint dictatesthat only m dimensions are to be retained in the maskingprocess.

The integer program (7) is computationally intractableeven for moderate-size datasets [28]. We note that the non-integer relaxation of (7) results in the trivial solution z∗ =md 1d. Note also that the matrix A depends on the dataset

used; thus in general it does not hold necessary propertiesfor relaxations of integer programs to be successful (e.g.being totally unimodular, having binary entries, etc.). Wealso attempted a Lagrangian non-integer relaxation in thefollowing form:

z∗ = arg minz

∥∥∥Az − m

d1|Sk|

∥∥∥p

+ λ ‖z‖1 ,

where again p = 1 or p = ∞. Note that since this is anon-integer relaxation, we consider the sparsity pattern ofthe solution to obtain a mask. We observed that (a) theperformance is worse than that obtained by the IP, and (b)it is difficult to obtain the value of the Lagrangian multiplierneeded for a particular mask size.

We propose a heuristic greedy algorithm that can findan approximate solution for (7) in a drastically reducedtime. The greedy approach in Algorithm 1, which werefer to as Manifold-Aware Pixel Selection for Isomap(MAPS-Isomap), gives an approximate solution for the `p-norm minimization in (7). The algorithm iteratively selectselements of the masking index set Ω as a function of the

1Note that we have also tried p = 2 numerically, but the masks obtaineddo not preserve the desired manifold structure. Also, note that we triedconsidering a scaling factor γ as an optimization parameter in (7) in placeof the constant m

d, but the latter performed best.

squared secants matrix A. We initialize Ω as the empty setand denote Ωc = [d] \ Ω. At iteration i of the algorithm,we find a new dimension that, when added to the existingdimensions in Ω, causes the squared norm of the maskedsecant to match the expected value of i

d as closely aspossible. More precisely, at step i of the algorithm, we findthe column of A indexed by ω ∈ Ωc (which is indicatedby Aω), whose addition with the sum of previously chosencolumns AΩ =

∑ω∈ΩAω has minimum distance (in `p-

norm) to id1|Sk|. Note that AΩ = Az, where z again denotes

the indicator vector for the masking index set Ω ⊆ [d];thus, the metric guiding the greedy selection matches theobjective function of the integer program (7).

The computational complexity of MAPS-Isomap isO(mdkn). To see this, note that in each of the m iterationsthe search for ω ∈ Ωc considers at most d elements, and thenumber of arithmetic operations in computing the `p-normterm is O(|Sk|). Thus, we have

TMAPS-Isomap(n,m, k, d) = O(md|Sk|) = O(mdkn),

where the last equality uses the fact that |Sk| ≤ kn.

C. LLE-Aware Mask Selection

Next, we propose a greedy algorithm for selection of anLLE-aware masking pattern that attempts to preserve theweights wij obtained from the optimization in (2). Preserv-ing these weights would in turn maintain the embedding Yfound from (3) through the image masking process.

The rationale behind the proposed algorithm is as fol-lows. The weights wij for j ∈ Nk(xi) are preserved ifboth the lengths of the secants involving xi (up to a scalingfactor) and the angles between these secants are preserved.Geometrically, this can be achieved if the distances betweenall the points in the set Ck+1(xi) := Nk(xi) ∪ xi arepreserved up to a scaling factor. For this purpose, we definethe secant clique for xi as

Sk+1(xi) := xj1 − xj2 : xj1 , xj2 ∈ Ck+1(xi); (8)

our goal for LLE-aware mask selection is to preservethe norms of these secants up to a scaling factor. Thisrequirement can be captured by a normalized inner productcommonly referred to as cosine similarity measure, definedas sim(α, β) := 〈α,β〉

‖α‖2‖β‖2 . To implement our method, wedefine a 3−dimensional array B of size c × d × n, wherec =

(k+1

2

)denotes the number of elements in each secant

clique Sk+1(xi). The array has entries B(`, j, i) = si`(j)2,

Page 6: Masking Strategies for Image Manifoldsmduarte/images/MSIM2014.pdfMasking Strategies for Image Manifolds Hamid Dadkhahi and Marco F. Duarte, Senior Member, IEEE Abstract—We consider

6

where si` denotes the `th secant contained in Sk+1(xi). Inwords, every 2-D slice of B, denoted by Bi := B(:, :, i)corresponds to the squared secants matrix for the secantclique Sk+1(xi), and the `th row of Bi corresponds to the`th secant in Sk+1(xi).

We now define our LLE-aware mask metric. The vectorα = Biz, where z is the mask indicator vector from (6),contains the squared norms of the masked secants fromSk+1(xi) as its entries. Similarly, the vector β = Bi1d willcontain the squared norms of the full secants in the sameset. Maximizing the cosine similarity sim(α, β) promotesthese two vectors being a scaled version of one another,i.e., the norms of the masked secants approximately beingequal to a scaling of the full secant norms. Note thatsince LLE is a local algorithm, the value of this scalingcan vary over data points without incurring distortion ofthe manifold structure. In order to incorporate the cosinesimilarity measure for all data points, we maximize thesum of the aforementioned similarities for all data pointsas follows:

z = arg maxz

n∑i=1

〈Biz,Bi1d〉||Biz||2||Bi1d||2

(9)

subject to 1Td z = m, z ∈ 0, 1d.

Finding an optimal solution for z from (9) has a combina-torial (exponential) time complexity. An approximation canbe obtained by greedily selecting the masking elements thatmaximize the value of the mask metric, one at a time. Theproposed algorithm, which we call Manifold-Aware PixelSelection for LLE (MAPS-LLE), is given in Algorithm 2.

The computational complexity of MAPS-LLE isO(mk2nd). To see this, note that the complexity of com-puting each of the matrices α, θ, and β is proportionalto the number of elements of the array B involved inthe summation; thus the aforementioned complexities areO(cdn), O(cmn), and O(cn), respectively. In addition, thecomputation of the cosine similarity vector λ can be donein O(cn) time. As a result, the complexity of MAPS-LLEis given by

TMAPS-LLE(n,m, k, d) = O(cdn) +O(m)(O(cmn)

+O(d)(O(cn) +O(cn)

))= O(cdn) +O(cm2n)

+O(mcnd)

(a)= O(mcnd)

(b)= O(mk2nd),

where in (a) we exploit the fact that m < d and (b) is dueto c = O(k2).

IV. NUMERICAL EXPERIMENTS

In this section, we present a set of experimental resultsthat compare the performance of the proposed algorithms tothose in the existing linear embedding and feature selection

literatures, in terms of preservation of the low-dimensionalstructure of several nonlinear manifolds.2

We once again remark that the goal of the maskingschemes proposed here is to reduce the number of datadimensions (in order to reduce data acquisition costs) whilepreserving the manifold structure. Thus, if we apply amanifold learning algorithm (e.g. Isomap) on the maskeddata, the resulting embedding is ideally as close as possibleto that obtained from full data. In addition, having obtainedthe embedding of the masked images from a manifold,we would like to evaluate how well the embedding canbe extended to new masked images — a setup knownin the literature as out-of-sample extension [29]. Thus,our comparison with standard dimensionality reductionschemes aims to show whether a performance gap existsif manifold learning schemes are applied to the maskedimages versus the original (full) images.

We evaluate the methods described in Sections II and III.In addition, we consider random masking, Sparse PCA [30],[31], and two unsupervised feature selection methods,SPEC and SPFS. In random masking, we pick an m-subsetof the d data dimensions uniformly at random. Sparse PCA(SPCA) is a variation of PCA in which sparsity is enforcedin the principal components. Note that since the support ofthe principal components is not required to be the same, wefocus on the support of the first principal component so thatwe can translate Sparse PCA into a masking scheme. Notealso that we use the SPFS-LAR version of SPFS, which isfavored by the authors of SPFS, since it does not requireextra parameter tuning (other than the parameter σ of theRBF kernel function). In our experiments, we perform agrid search over 1, 2, . . . , 10 in order to find the value ofthe parameter σ that works best.

For our experiments, we use five standard manifoldmodeling datasets — the MNIST dataset [32], the Headsdataset [33],3 the Faces dataset [6], the Statue dataset [34]–[36], and the Hands dataset [4] — as well as one customeye-tracking dataset from a computational eyeglass proto-type as detailed in Table I. For the MNIST dataset, wefocus on the subset corresponding to the handwritten digit2’s. The Eyeglasses dataset corresponds to captures from aprototype implementation of computational eyeglasses thatuse the imaging sensor array of [18].

The algorithms are tested for linear embeddings 4 ofdimensions m = 50, 100, 150, 200, 250, 300; for the mask-ing algorithms of Section III, m provides the size of themasking (number of dimensions preserved), while for thelinear embedding algorithms of Section II, m providesthe dimensionality of the embedding. Note that since thelinear embeddings employ all d dimensions of the originaldata, the latter algorithms have an intrinsic performance

2MATLAB code for generation of the results of this section is availableat http://www.ecs.umass.edu/∼mduarte/Software.html.

3This dataset is originally termed as the Faces dataset. However, inorder to avoid confusion with the Faces dataset of [6], we rename it tothe Heads dataset.

4We excluded NuMax from consideration since its performance onembeddings from d to m (which is moderately large here) dimensionsis similar to that of PCA for our datasets.

Page 7: Masking Strategies for Image Manifoldsmduarte/images/MSIM2014.pdfMasking Strategies for Image Manifolds Hamid Dadkhahi and Marco F. Duarte, Senior Member, IEEE Abstract—We consider

7

Algorithm 2 Manifold-Aware Pixel Selection for LLE (MAPS-LLE)Inputs: neighborhood clique secant array B, masking size mOutputs: masking index set ΩInitialize: Ω← α←

∑j∈[d]B(:, j, :) Compute matrix of squared secant norms.

for i = 1→ m doθ ←

∑j∈ΩB(:, j, :) Compute matrix of squared masked secant norms for current masking set Ω.

for j ∈ ΩC doβ ← θ +B(:, j, :) Update squared masked secant norms when j is added to mask Ω.λ(j)←

∑t∈[n]

〈α(:,t),β(:,t)〉||α(:,t)||2||β(:,t)||2 Compute cosine similarity measure for updated mask.

end forω ← arg maxj∈ΩC λ(j) Find new mask element that maximizes cosine similarity.Ω← Ω ∪ ω Add selected dimension to the masking index set.

end for

TABLE ISUMMARY OF EXPERIMENTAL DATASETS

Dataset Eyeglasses MNIST Statue Heads Faces HandsNumber of images n 929 1000 960 698 1965 1000Embedding dim. ` 2 5 3 3 3 4Image dim. d 40× 40 28× 28 51× 34 32× 32 28× 20 64× 64Neighborhood size for Isomap k 12 10 12 10 9 8Neighborhood size for LLE k 12 10 12 7 10 10

advantage against the former. The performance of randommasking is averaged over 100 independent draws in eachcase.

The combinatorial nature of the integer program (7)renders it significantly expensive in computation, even forthe small dimensions for the data shown in Table I (notconverging even after 24 hours in our experiments). Incontrast, the remaining masking algorithms each take onlyup to 20 seconds (for m = 300) to complete using thesame computing platform. In [1], MAPS-Isomap has beenshown to be a good approximation of the integer program.Hence, here we only consider MAPS-Isomap for Isomap-aware mask selection.

Figure 1 indicates the masking patterns associated withdifferent masking methods for all the datasets for a masksize of m = 100 pixels; the active pixels (i.e. the pixels thatare preserved by the mask) are marked in white. As shownin this figure, MAPS-Isomap and MAPS-LLE do not selectthe pixels with the highest variance, in contrast to PCoA.The pixel masks selected by the MAPS algorithms suggestthat pixels with highest variations are not necessarily moreinformative of the underlying manifold structure.

We note in passing that in certain LLE experimentswe obtained data covariance matrices that are singular ornearly singular (often due to masking). In such cases, thecovariance matrix can be conditioned by adding a smallmultiple of the identity matrix [37], [38].

A. Preservation of Nonlinear Manifold Structure

For each selection of masking algorithm and size, weapply the manifold learning algorithm (either Isomap orLLE) directly on the masked images. We then check theperformance of the manifold embedding obtained from the

masked datasets to that of the manifold embedding fromthe full dataset using different performance metrics.

For Isomap, we use the following two criteria to evaluatethe performance of masking or embedding methods. First,we use residual variance as a global metric to measure howwell the Euclidean distances in the embedded space matchthe geodesic distances in the ambient space [4]. For eachdataset, we pick the embedding dimensionality ` to be thevalue after which the residual variance ceases to decreasesubstantially with added dimensions. Note that the obtainedvalues of ` agree with the intuitive number of degrees offreedom for the Heads dataset (two rotation angles – pitchand yaw – for orientation, plus an illumination variable),the Eyeglasses dataset (2-D gaze locations), and the Statuedataset (2-D rotation plus camera position). Second, we usethe percentage of preserved nearest neighbors [16]. Moreprecisely, for a given neighborhood of size k, we obtainthe percentage of the k-nearest neighbors in the full d-dimensional data that are among the k-nearest neighborswhen the masked image manifold is embedded.

For LLE, we consider the following embedding error.Suppose the pairs (X ,Y) and (X ′,Y ′) designate the ambi-ent and embedded set of vectors for full and masked data,respectively. Having found the weights wi,j from the fulldata via (2), we define the embedding error for the maskeddata in the following way:

e =

n∑i=1

∥∥∥∥∥y′i − ∑j:xj∈Nk(xi)

wijy′j

∥∥∥∥∥2

2

. (10)

The rationale behind this definition of the embedding erroris that, ideally, the embedded vectors y′i obtained frommasked images should provide a good linear fitting usingthe neighborhood approximation weights obtained from the

Page 8: Masking Strategies for Image Manifoldsmduarte/images/MSIM2014.pdfMasking Strategies for Image Manifolds Hamid Dadkhahi and Marco F. Duarte, Senior Member, IEEE Abstract—We consider

8

Full Image SPCA PCoA SPFS MAPS-LLE MAPS-Isomap Random

Eye

glas

ses

MN

IST

Stat

ueH

eads

Face

sH

ands

Fig. 1. Masks obtained for each dataset with masking size of m = 100 via different masking schemes.

original (full) images. In other words, (10) finds the amountof deviation of Y ′ from Y , which minimizes the value ofthis score, cf. (3).

Since LLE is a local algorithm and does not preserve theglobal structure of the manifold, there is no guarantee forpreservation of nearest neighbors beyond k in general. Thiswas observed in our experiments by the non-monotonicityof neighborhood preservation as a function of the mask-ing/embedding size. Thus for LLE we do not include plotsfor percentage of preserved nearest neighbors.

In Figures 2 and 3, we display the residual variance andneighborhood preservation results of different masking andembedding methods, respectively, when Isomap is used asthe manifold learning algorithm. MAPS-Isomap is shownonly for the choice p = 1, as setting p = ∞ yieldssimilar results. We observe that the performance of MAPS-Isomap and MAPS-LLE are significantly and consistentlybetter than those of random sampling, PCoA, and SparsePCA. PCoA fails to identify the best dimensions to preservefrom the original data. This failure is particularly evidentfor the Heads dataset, where the distribution of the imageenergy across the pixels is most uniform. SPCA has anerratic behavior across datasets; it is performing well forsome of the datasets and for moderately large values ofm, but poorly for other datasets and for lower values of m.Note that, as expected, SPFS always outperforms SPEC, butis outperformed by our MAPS algorithms. Also, we have

dropped the curve related to SPEC for datasets for whichSPEC was performing poorly. The values of the parameterσ used for SPFS is [6, 2, 4, 4, 5] for eyeglasses, MNIST,Heads, Faces, and Statue datasets, respectively. Interest-ingly, random masking outperformed all the methods otherthan the proposed MAPS algorithms for Heads and Facesdatasets. This can be attributed to the activity being morespread out over the pixels for the latter datasets.

Additionally, for small values of m the linear embeddingalgorithms of Section II can significantly outperform themasking algorithms of Section III, which is to be expectedsince the former approaches employ all d dimensions of theoriginal data. More surprisingly, we see that for sufficientlylarge values of m the performance of the MAPS algorithmsapproaches or matches that of the linear embedding algo-rithms, even though the embedding feasible set for maskingmethods is significantly reduced. The results are consistentacross the datasets used in the experiments.

Figure 4 shows the embedding error plots over differentdatasets for the case that LLE is used as the manifoldlearning algorithm. Here we can see that the MAPS-LLEalgorithm outperforms all the other masking algorithmsacross all the datasets consistently. Note that for the plotsof Heads and Faces datasets, we have dropped the SPCAcurves due to its poor performance and change in thescaling of the plots as a result. The values of the parameterσ used for SPFS is [5, 4, 4, 4, 9] for eyeglasses, MNIST,

Page 9: Masking Strategies for Image Manifoldsmduarte/images/MSIM2014.pdfMasking Strategies for Image Manifolds Hamid Dadkhahi and Marco F. Duarte, Senior Member, IEEE Abstract—We consider

9

50 100 150 200 250 300

0.05

0.1

0.15

0.2

Eyeglasses

Embedding Dim./Masking Size m

Re

sid

ua

l V

aria

nce

Full DataPCASPCAPCoAMAPS−LLEMAPS−IsomapSPECSPFSRandom

50 100 150 200 250 300

0.2

0.3

0.4

0.5

0.6

0.7

MNIST

Embedding Dim./Masking Size m

Re

sid

ua

l V

aria

nce

50 100 150 200 250 300

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Statue

Embedding Dim./Masking Size m

Re

sid

ua

l V

aria

nce

50 100 150 200 250 300

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Heads

Embedding Dim./Masking Size m

Re

sid

ua

l V

aria

nce

50 100 150 200 250 3000.1

0.2

0.3

0.4

0.5

0.6

Faces

Embedding Dim./Masking Size m

Re

sid

ua

l V

aria

nce

50 100 150 200 250 300

0.1

0.2

0.3

0.4

0.5

0.6

Hands

Embedding Dim./Masking Size m

Re

sid

ua

l V

aria

nce

Fig. 2. Performance comparison in terms of residual variance for linear embeddings (dashed lines) and masking algorithms (solid lines) with respectto original full-length data, when Isomap is used as manifold learning algorithm. Residual variance as a function of m is used.

50 100 150 200 250 30050

55

60

65

70

Eyeglasses

Embedding Dim./Masking Size m

% P

rese

rve

d N

eig

hb

ors

Full DataPCASPCAPCoAMAPS−LLEMAPS−IsomapSPECSPFSRandom

50 100 150 200 250 300

18

20

22

24

26

28

30

MNIST

Embedding Dim./Masking Size m

% P

rese

rve

d N

eig

hb

ors

50 100 150 200 250 300

45

50

55

60

65

70

Statue

Embedding Dim./Masking Size m

% P

rese

rve

d N

eig

hb

ors

50 100 150 200 250 300

40

50

60

70

80

Heads

Embedding Dim./Masking Size m

% P

rese

rve

d N

eig

hb

ors

50 100 150 200 250 300

25

30

35

40

Faces

Embedding Dim./Masking Size m

% P

rese

rve

d N

eig

hb

ors

50 100 150 200 250 300

30

35

40

45

50

55

Hands

Embedding Dim./Masking Size m

% P

rese

rve

d N

eig

hb

ors

Fig. 3. Performance comparison for linear embeddings (dashed lines) and masking algorithms (solid lines) with respect to original full-length data,when Isomap is used as manifold learning algorithm. Percentage of preserved nearest neighbors for 20 neighbors is used.

Heads, Faces, and Statue datasets, respectively.

Next, we compare the performance of different maskingschemes at preserving the 2-D manifolds learned (viaIsomap) from the Eyeglasses dataset, containing pictures ofan eye pointed in different directions, and the Heads dataset,in which a 3-D model of a head is subject to rotations inpitch and yaw. As shown in Figure 5, the 2-D manifoldlearned from images masked using MAPS-Isomap withm = 50 pixels resembles the 2-D manifold learned fromfull images. We have also verified that when the size of themask is increased to m = 200, the 2-D manifold learnedfrom the masked images is essentially visually identical to

that learned from the full data. On the other hand, the maskschosen using random masking, SPCA, and PCoA warp thestructure of the manifold learned from the masked data,which creates shortcuts between the left and right handsides of the manifold.

Finally, we repeat the previous experiment for the casethat LLE is used as the manifold learning algorithm andagain we consider 2-D manifolds learned from the Eye-glasses dataset. We compare the performance of MAPS-LLE in preserving the 2-D manifold from Eyeglassesdataset with that of random masking, SPCA, and PCoAat masking size of m = 100. As can be observed from

Page 10: Masking Strategies for Image Manifoldsmduarte/images/MSIM2014.pdfMasking Strategies for Image Manifolds Hamid Dadkhahi and Marco F. Duarte, Senior Member, IEEE Abstract—We consider

10

50 100 150 200 250 300

2

4

6

8

10

Eyeglasses

Embedding Dim./Masking Size m

Em

beddin

g E

rror

Full DataPCASPCAPCoAMAPS−LLEMAPS−IsomapSPECSPFSRandom

50 100 150 200 250 300

500

1000

1500

2000

2500

MNIST

Embedding Dim./Masking Size m

Em

beddin

g E

rror

50 100 150 200 250 300

10

20

30

40

50

60

70

Statue

Embedding Dim./Masking Size m

Em

be

dd

ing

Err

or

50 100 150 200 250 300

10

20

30

40

50

Heads

Embedding Dim./Masking Size m

Em

beddin

g E

rror

50 100 150 200 250 300

20

40

60

80

100

120

Faces

Embedding Dim./Masking Size m

Em

beddin

g E

rror

50 100 150 200 250 300

100

200

300

400

500

600

700

Hands

Embedding Dim./Masking Size m

Em

beddin

g E

rror

Fig. 4. Performance comparison for linear embeddings (dashed lines) and masking algorithms (solid lines) with respect to original full-length datawhen LLE is used as manifold learning algorithm. Embedding error as a function of m.

Figure 6, the 2-D manifold learned from images maskedvia MAPS-LLE resembles that learned from the full imagesmore closely than PCoA. Note that the manifold learnedfrom random masking is warped and does not preserve thedistances among data points faithfully. Also note that theLLE embedding for the Heads dataset does not provide aclear visualization of the controlled parameters.

B. Out-of-Sample Extension

Next, we consider the effect of masking on out-of-sample extension (OoSE) for manifold learning algorithms.OoSE generalizes the result of the nonlinear manifoldembedding for new data points. For LLE OoSE, we usethe procedure derived in [37], [39]. For Isomap OoSE, weuse the procedure suggested in [39], [40].

The experiments in this section pursue the followinggeneral framework. First, we apply the masks designed inthe previous section on the dataset. Next, we perform OoSEin a leave-one-out fashion on the masked dataset excludingthe selected data point. Then, we compare the embeddingfor the new data point to its counterpart obtained fromembedding of all the data points from full data (includingthe point which is left out as an “out of sample”).

Note that for Isomap we cannot directly compare thesetwo points [41], as embeddings learned from differentsamplings of the manifold are often subject to translation,rotation, and scaling. These variations must be addressedvia manifold alignment before the embedded datasets arecompared. We find the optimal alignment of the originalmanifold and the OoSE manifold via Procrustes analy-sis [42], [43] and apply the resulting translational, rota-tional, and scaling components on the OoSE manifold.Finally, we measure the OoSE error as the `2 distancebetween the two manifolds for the embedded test point,averaged across all test points.

Due to the local nature of LLE, the embedding obtainedvia OoSE remains unchanged from the original for mostof the data points. Hence, it is logical to only consider theembedding error for the points that are affected by OoSE.Let xi0 indicate the out-of-sample point, and define the setN ′k(xi0) of points affected by OoSE on point xi0 as

N ′k(xi0) = xi ∈ X : xi0 ∈ Nk(xi) or xi = xi0,

i.e., the set of all the points that have xi0 as their neighborsplus xi0 itself. Denote the set of indices for points containedin N ′k(xio) as I(i0) = i : xi ∈ N ′k(xio). We then definea version of the metric (10) that accounts only for locallinear fits of the affected by OoSE as

eOoSE =1

n

n∑io=1

∑i∈Io

∥∥∥∥∥y′i − ∑j:xj∈Nk(xi)

wijy′j

∥∥∥∥∥2

2

, (11)

which we term as average OoSE embedding error.Figures 7 and 8 show the performance of OoSE from

masked images for Isomap and LLE as manifold learningalgorithm, respectively. In each case, due to the highcomputational complexity of the leave-one-out experimentin this setting, we only compare the performance of therespective MAPS algorithm with that of random masking.As can be observed from the figures, for both Isomap andLLE OoSE, the respective MAPS algorithms consistentlyoutperform random masking for all datasets.

C. Eye Gaze Estimation

Finally, we consider an application of manifold modelsin our motivating computational eyeglasses platform. Moreprecisely, we focus on the Eyeglasses dataset, illustrated inFigure 9, which is collected for the purpose of training anestimation algorithm for eye gaze position in a 2-D imageplane. The dataset corresponds to a collection of image

Page 11: Masking Strategies for Image Manifoldsmduarte/images/MSIM2014.pdfMasking Strategies for Image Manifolds Hamid Dadkhahi and Marco F. Duarte, Senior Member, IEEE Abstract—We consider

11

Full Data MAPS-Isomap Random SPCA PCoAFig. 5. Performance comparison of two-dimensional projections of eyeglasses (top row) and heads dataset (bottom row) masked with m = 50 viadifferent methods after Isomap manifold learning.

Full Data MAPS-LLE Random SPCA PCoAFig. 6. Performance comparison of two-dimensional projections of eyeglasses dataset masked with m = 100 via different methods after LLE manifoldlearning.

50 100 150 200 250 300

0

0.5

1

Eyedata

Masking Size m

Avera

ge O

oS

E E

rror

Full DataMAPS−IsomapRandom

50 100 150 200 250 300

0

2

4

6

8

10

MNIST

Masking Size m

Avera

ge O

oS

E E

rror

50 100 150 200 250 300

0

1000

2000

3000

Statue

Masking Size m

Avera

ge O

oS

E E

rror

50 100 150 200 250 300

0

1

2

3

4

5

6

Heads

Masking Size m

Avera

ge O

oS

E E

rror

50 100 150 200 250 300

0

0.5

1

1.5

2

Faces

Masking Size m

Avera

ge O

oS

E E

rror

50 100 150 200 250 300

0

2

4

6

8

10

12

Hands

Masking Size m

Ave

rag

e O

oS

E E

rro

r

Fig. 7. Performance evaluation of Isomap OoSE for various datasets. The MAPS-Isomap algorithm consistently outperforms random masking.

captures of an eye from a camera mounted on an eyeglassframe as the subject focuses their gaze into a dense grid ofknown positions (size 31× 30, covering a 600× 600 pixelscreen projection) that is used as ground truth.

Most of the literature on eye gaze estimation has focusedon feature-based approaches, where explicit geometric fea-tures such as the contours and corners of the pupil, limbus

and iris, are used to extract features of the eye [44], [45].Unfortunately, such methods require all the pixels of theeye image and are therefore not compatible with imagemasking. Alternatively, an appearance-based method thatadopts the nonlinear manifold models at the center of thispaper has been proposed in [46]. The idea behind thismethod is to find a nonlinear manifold embedding of the

Page 12: Masking Strategies for Image Manifoldsmduarte/images/MSIM2014.pdfMasking Strategies for Image Manifolds Hamid Dadkhahi and Marco F. Duarte, Senior Member, IEEE Abstract—We consider

12

50 100 150 200 250 300

0

0.02

0.04

0.06

0.08

Eyeglasses

Masking Size m

Avera

ge O

oS

E E

mbeddin

g E

rror

Full DataMAPS−LLERandom

50 100 150 200 250 300

0

5

10

15

20

MNIST

Masking Size m

Avera

ge O

oS

E E

mbeddin

g E

rror

50 100 150 200 250 300

0

0.1

0.2

0.3

0.4

0.5

Statue

Masking Size m

Avera

ge O

oS

E E

mbeddin

g E

rror

50 100 150 200 250 300

0

0.05

0.1

0.15

0.2

0.25

Heads

Masking Size m

Avera

ge O

oS

E E

mbeddin

g E

rror

50 100 150 200 250 300

0

0.05

0.1

0.15

0.2

0.25

0.3Faces

Masking Size m

Avera

ge O

oS

E E

mbeddin

g E

rror

50 100 150 200 250 300

0

1

2

3

4

5

Hands

Masking Size m

Ave

rag

e O

oS

E E

mb

ed

din

g E

rro

r

Fig. 8. Performance evaluation of LLE out-of-sample extension for various datasets. MAPS-LLE consistently outperforms random masking.

original dataset X and extend it to the 2-D parameterspace samples given by the eye gaze ground truth. Theproposed method employs the weights obtained by LLE,when applied to the training image dataset together with atesting image X ∪ xt, and applies these weights in theparameter space to estimate the parameters of the test point.

We evaluate the performance of different masking meth-ods on eye gaze estimation in a leave-one-out fashion,where each one of the eye images is used as the test data,the rest of the images are considered as training data, andthe LLE weights are computed from the masked images.Figure 9 shows the average gaze estimation error e (interms of pixels in the projected screen) as a function of thelower dimension m for the different linear embedding andmasking algorithms, together with a baseline that employsthe full-length original data. While MAPS algorithms againoutperform other masking counterparts, there is a minor gapin performance between estimation from masked vs. full-length data. Furthermore, we believe that the improvementobtained by PCA vs. full-length data is due to the high levelof noise observed in the image captures obtained with thelow-power imaging architecture [18].

V. DISCUSSION AND FUTURE WORK

Our numerical experiments indicate that while eachMAPS algorithm is well suited for its particular manifoldlearning approach, MAPS-LLE often performs well whenapplied together with Isomap. We conjecture that this is dueto the fact that LLE, by preserving local structure, is alsopreserving the global structure that is relevant to Isomap.

Since there are many other types of geometrical informa-tion leveraged by alternative manifold learning algorithms,it would be interesting to derive masking algorithms forthem as well. Furthermore, there are several frameworksthat can benefit from generalizations of the proposed mask-ing algorithms. For instance, masking algorithms designed

50 100 150 200 250 300

25

30

35

40

45

Embedding Dim./Masking Size m

Avera

ge G

aze E

stim

ation E

rror

Full DataPCASPCAPCoAMAPS−LLEMAPS−IsomapSPFSRandom

Fig. 9. Left: Example images from Eyeglasses dataset. Right:Performance of eye gaze estimation using an appearance-basedmethod from embedded and masked images as a function of m.

for datasets that are expressed as a union of manifoldscan find applications in classification and pattern recog-nition. One may also leverage temporal information invideo sequences to design more efficient manifold maskingalgorithms that take advantage of such temporal correlation.

On the connection between feature selection schemes andthe proposed masking algorithms, note that the applicationof feature selection in supervised learning problems isdriven by the goal of minimizing the estimation distor-tion or the regression/classification error, respectively. Ourproposed manifold learning feature selection schemes aredriven by the goal of minimizing the distortion of theembedding obtained via nonlinear manifold learning fromthe selected features vs. the embedding obtained from allfeatures. For this purpose, we have derived data metricsthat are specific to the geometric structure exploited by theconsidered manifold learning algorithms. The use of sucha metric in place of the actual learning algorithm links ourproposed approaches to the filter class of feature selectionmethods. One could derive alternative approaches to maskdesign by leveraging alternative feature selection schemes(such as backward or bidirectional elimination) similarly.

Page 13: Masking Strategies for Image Manifoldsmduarte/images/MSIM2014.pdfMasking Strategies for Image Manifolds Hamid Dadkhahi and Marco F. Duarte, Senior Member, IEEE Abstract—We consider

13

VI. CONCLUSIONS

We have considered the problem of selecting imagemasks that aim to preserve the nonlinear manifold structureused in parameter estimation from images, in order to beable to learn the manifolds directly from the masked imagedata. Such a formulation enables a new form of compressivesensing using novel imaging sensors that feature powerconsumption proportional to the number of pixels sensed.Our experimental evidence shows that the algorithms pro-posed for Isomap and LLE manifold learning outperformbaseline approaches, while requiring only a fraction of thecomputational cost. As a specific example, we have shownthe potential of manifold learning from masked images foran eye gaze tracking application as an example applicationin cyber-physical systems.

REFERENCES

[1] H. Dadkhahi and M. F. Duarte, “Masking schemes for imagemanifolds,” in IEEE Statistical Signal Proc. Workshop (SSP), GoldCoast, Australia, 2014, pp. 256–259.

[2] ——, “Image masking schemes for local manifold learning meth-ods,” in IEEE Int. Conf. on Acoustics, Speech, and Signal Processing(ICASSP), Brisbane, Australia, 2015, pp. 5768–5772.

[3] C. M. Bishop, Pattern Recognition and Machine Learning (Informa-tion Science and Statistics). Secaucus, NJ: Springer-Verlag, 2006.

[4] J. B. Tenenbaum, V. d. Silva, and J. C. Langford, “A global geometricframework for nonlinear dimensionality reduction,” Science, vol.290, no. 5500, pp. 2319–2323, 2000.

[5] R. R. Coifman and S. Lafon, “Diffusion maps,” Applied and Com-putational Harmonic Analysis, vol. 21, no. 1, pp. 5–30, July 2006.

[6] S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reductionby locally linear embedding,” Science, vol. 290, no. 5500, pp. 2323–2326, 2000.

[7] M. Belkin and P. Niyogi, “Laplacian eigenmaps for dimensionalityreduction and data representation,” Neural Computation, vol. 15,no. 6, pp. 1373–1396, Mar. 2003.

[8] D. L. Donoho and C. Grimes, “Hessian eigenmaps: Locally linearembedding techniques for high-dimensional data,” Proc. Nat. Acad.Sciences (PNAS), vol. 100, no. 10, pp. 5591–5596, May 2003.

[9] D. L. Donoho, “Compressed sensing,” IEEE Trans. Info. Theory,vol. 52, no. 4, pp. 1289–1306, Apr. 2006.

[10] E. J. Candes, “Compressive sampling,” in Proc. Int. Congress ofMathematicians, vol. 3, Madrid, Spain, Aug. 2006, pp. 1433–1452.

[11] R. Baraniuk and M. Wakin, “Random projections of smooth mani-folds,” Found. Comp. Math., vol. 9, no. 1, pp. 51–77, Jan. 2009.

[12] C. Hegde, M. Wakin, and R. Baraniuk, “Random projections formanifold learning,” in Neural Info. Proc. Systems (NIPS), Vancouver,BC, Dec. 2007, pp. 641–648.

[13] Y. Freund, S. Dasgupta, M. Kabra, and N. Verma, “Learning thestructure of manifolds using random projections,” in Neural Info.Proc. Systems (NIPS), Vancouver, BC, Dec. 2007, pp. 473–480.

[14] K. L. Clarkson, “Tighter bounds for random projections of mani-folds,” in Annu. Symp. Computational Geometry. ACM, 2008.

[15] H. L. Yap, M. B. Wakin, and C. J. Rozell, “Stable manifoldembeddings with structured random matrices,” IEEE J. SelectedTopics in Signal Processing, vol. 7, no. 4, Sep. 2013.

[16] C. Hegde, A. Sankaranarayanan, W. Yin, and R. Baraniuk, “A convexapproach for learning near-isometric linear embeddings,” Submittedto J. Machine Learning Research, 2012.

[17] A. Mayberry, P. Hu, B. Marlin, C. Salthouse, and D. Ganesan,“iShadow: Design of a wearable, real-time mobile gaze tracker,”in Int. Conf. Mobile Systems, Applications and Services (MobiSys),Bretton Woods, NH, June 2014, pp. 82–94.

[18] Centeye, Inc., “Stonyman and Hawksbill vision chips,” Availableonline at http://centeye.com/products/current-vision-chips-2, Nov.2011.

[19] J. M. Duarte-Carvajalino and G. Sapiro, “Learning to sense sparsesignals: Simultaneous sensing matrix and sparsifying dictionaryoptimization,” IEEE Trans. Image Proc., vol. 18, no. 7, pp. 1395–1408, July 2009.

[20] T. Cox and M. Cox, Multidimensional Scaling. Boca Raton :Chapman & Hall/CRC, 2001.

[21] C. Hegde, A. C. Sankaranarayanan, and R. G. Baraniuk, “Near-isometric linear embeddings of manifolds,” in IEEE Statistical SignalProc. Workshop (SSP), Ann Arbor, MI, Aug. 2012, pp. 728–731.

[22] H. Liu and H. Motoda, Feature Selection for Knowledge Discoveryand Data Mining. Norwell, MA: Kluwer, 1998.

[23] A. J. Miller, Subset selection in regression. London, England:Chapman and Hall, 1990.

[24] H. Liu and L. Yu, “Toward integrating feature selection algorithmsfor classification and clustering,” IEEE Trans. Knowledge and DataEngineering, vol. 17, no. 4, pp. 491–502, Apr. 2005.

[25] Z. Zhao and H. Liu, “Spectral feature selection for supervised andunsupervised learning,” in International Conference on MachineLearning (ICML), 2007, pp. 1151–1157.

[26] X. He, D. Cai, and P. Niyogi, “Laplacian score for feature selection,”in Neural Information Processing Systems (NIPS), 2005.

[27] Z. Zhao, L. Wang, H. Liu, and J. Ye, “On similarity preservingfeature selection,” IEEE Trans. on Knowledge and Data Engineering,vol. 25, no. 3, pp. 619–632, March 2013.

[28] C. H. Papadimitriou and K. Steiglitz, Combinatorial Optimization:Algorithms and Complexity. Upper Saddle River, NJ: Prentice Hall,1982.

[29] L. J. P. van der Maaten, E. O. Postma, and H. J. van den Herik, “Di-mensionality Reduction: A Comparative Review,” MICC, MaastrichtUniversity, Maastricht, Netherlands, Tech. Rep., Feb. 2008.

[30] H. Zou, T. Hastie, and R. Tibshirani, “Sparse principal componentanalysis,” J. Computational and Graphical Statistics, vol. 15, no. 2,pp. 262–286, 2006.

[31] A. d’Aspremont, L. El Ghaoui, M. Jordan, and G. Lanckriet, “Adirect formulation of sparse PCA using semidefinite programming,”SIAM Review, vol. 49, no. 3, pp. 434–448, July 2007.

[32] Y. LeCun and C. Cortes, “MNIST handwritten digit database,”Available online at http://yann.lecun.com/exdb/mnist, 1998.

[33] J. B. Tenenbaum, “Data sets for nonlinear dimensionality reduction,”Available online at http://isomap.stanford.edu/datasets.html, 2000.

[34] R. Pless and I. Simon, “Using thousands of images of an object,”in Joint Conf. Information Science (JCIS), Research Triangle Park,NC, Mar. 2002, pp. 684–687.

[35] ——, “Embedding images in non-flat spaces,” in Int. Conf. ImagingScience, Systems, and Technology, 2002, pp. 182–188.

[36] R. Pless and R. Souvenir, “A survey of manifold learning forimages,” IPSJ Trans. Computer Vision and Applications, vol. 1, pp.83–94, 2009.

[37] L. K. Saul and S. T. Roweis, “Think globally, fit locally: unsuper-vised learning of low dimensional manifolds,” J. Machine LearningResearch, vol. 4, pp. 119–155, Dec. 2003.

[38] ——, “An introduction to locally linear embedding,” AT&T LabsResearch, Tech. Rep., 2002.

[39] Y. Bengio, J.-F. Paiement, and P. Vincent, “Out-of-sample extensionsfor LLE, Isomap, MDS, Eigenmaps, and spectral clustering,” inNeural Info. Proc. Systems (NIPS), Vancouver, BC, Dec. 2003, pp.177–184.

[40] V. d. Silva and J. B. Tenenbaum, “Global versus local methods innonlinear dimensionality reduction,” in Neural Info. Proc. Systems(NIPS), Vancouver, BC, Dec. 2003, pp. 705–712.

[41] F. Dornaika and B. Raduncanu, “Out-of-sample embedding for man-ifold learning applied to face recognition,” in IEEE Conf. ComputerVision and Pattern Recognition Workshops (CVPRW), Portland, OR,June 2013, pp. 862–868.

[42] G. H. Golub and C. F. Van Loan, Matrix Computations (3rd Ed.).Baltimore, MD: Johns Hopkins University Press, 1996.

[43] C. Wang and S. Mahadevan, “Manifold alignment using procrustesanalysis,” in Int. Conf. Machine Learning (ICML), New York, NY,2008, pp. 1120–1127.

[44] C. Morimoto and M. Mimica, “Eye gaze tracking techniques for in-teractive applications,” Computer Vision and Image Understanding,vol. 98, no. 1, pp. 4–24, Apr. 2005.

[45] D. W. Hansen and Q. Ji, “In the eye of the beholder: A surveyof models for eyes and gaze,” IEEE Trans. Pattern Analysis andMachine Intelligence, vol. 32, no. 3, pp. 478–500, Mar. 2010.

[46] K. Tan, D. Kriegman, and N. Ahuja, “Appearance-based eye gazeestimation,” in IEEE Workshop on the Application of ComputerVision (WACV), Orlando, FL, Dec. 2002, pp. 191–195.


Recommended