+ All Categories
Home > Documents > Visualizing Dimensionally-Reduced Data

Visualizing Dimensionally-Reduced Data

Date post: 25-Jan-2022
Category:
Upload: others
View: 14 times
Download: 0 times
Share this document with a friend
25
Matthew Brehmer ACM BELIV Workshop Nov 10, 2014 Michael Sedlmair Stephen Ingram Tamara Munzner Visualizing Dimensionally-Reduced Data: Interviews with Analysts and a Characterization of Task Sequences
Transcript
Page 1: Visualizing Dimensionally-Reduced Data

Matthew Brehmer

ACM BELIV Workshop Nov 10, 2014

Michael Sedlmair

Stephen Ingram

Tamara Munzner

Visualizing Dimensionally-Reduced Data:Interviews with Analysts and a Characterization of Task Sequences

Page 2: Visualizing Dimensionally-Reduced Data

Matthew Brehmer

ACM BELIV Workshop Nov 10, 2014

Michael Sedlmair

Stephen Ingram

Tamara Munzner

Visualizing Dimensionally-Reduced Data:Interviews with Analysts and a Characterization of Task Sequences

Page 3: Visualizing Dimensionally-Reduced Data

ACM BELIV: DR Vis Tasks - Nov 10, 2014 Matthew Brehmer2

A need for abstract task characterization…

…yet specific to data type

Page 4: Visualizing Dimensionally-Reduced Data

ACM BELIV: DR Vis Tasks - Nov 10, 2014 Matthew Brehmer3

an abundance of vis task characterization

Heer & Shneiderman (2012)

Mullins & Treu (1993)

Springmeyer et al. (1992)RE Roth (2013)

Pike, Stasko, et al. (2009)

Amar & Stasko (2004)

Pirolli & Card (2005)

Card, Mackinlay, Shneiderman (1999)

Klein, Moon, & Hoffman (2006)Liu & Stasko (2010)

Spence (2007)

Casner (1991)Chi & Riedl (1998)

Chuah & Roth (1996)

Gotz & Zhou (2008)

Roth & Mattis (1990)Shneiderman (1996)

Wehrend & Lewis (1990)Yi, Stasko, et al. (2007)Zhou & Feiner (1998)

Andrienko & Andrienko (2006)

Buja et al. (1996)

Dix & Ellis (1998)

Keim (2002)

Valiati et al. (2006)Tweedie (1997)

Ward & Yang (2004)

Amar, Eagan, & Stasko (2005)

Brehmer & Munzner (2013)

Schulz et al. (2013)

Page 5: Visualizing Dimensionally-Reduced Data

ACM BELIV: DR Vis Tasks - Nov 10, 2014 Matthew Brehmer4

data-type specific task characterization

Shneiderman. (1996) IEEE Symp. Visual Languages

Henry & Fekete. (2006) ACM BELIV Workshop

Lee et al. (2006) ACM BELIV Workshop

Lammarsch et al. (2012) EuroVA Workshop

Vis Tasks for 1D, 2D, 3D, Multi-Dim, Temporal, Tree, & Network Data

Vis Tasks for Tabular Data

Vis Tasks for Graph Data

Vis Tasks for Time-Oriented Data

Page 6: Visualizing Dimensionally-Reduced Data

ACM BELIV: DR Vis Tasks - Nov 10, 2014 Matthew Brehmer4

data-type specific task characterization

Shneiderman. (1996) IEEE Symp. Visual Languages

Henry & Fekete. (2006) ACM BELIV Workshop

Lee et al. (2006) ACM BELIV Workshop

Lammarsch et al. (2012) EuroVA Workshop

Vis Tasks for 1D, 2D, 3D, Multi-Dim, Temporal, Tree, & Network Data

Vis Tasks for Tabular Data

Vis Tasks for Graph Data

Vis Tasks for Time-Oriented Data

…what about DR data?

Page 7: Visualizing Dimensionally-Reduced Data

ACM BELIV: DR Vis Tasks - Nov 10, 2014 Matthew Brehmer5

dimensionality reduction (e.g. PCA, MDS) & vis

Page 8: Visualizing Dimensionally-Reduced Data

ACM BELIV: DR Vis Tasks - Nov 10, 2014 Matthew Brehmer5

dimensionality reduction (e.g. PCA, MDS) & vis

Page 9: Visualizing Dimensionally-Reduced Data

ACM BELIV: DR Vis Tasks - Nov 10, 2014 Matthew Brehmer5

dimensionality reduction (e.g. PCA, MDS) & vis

Page 10: Visualizing Dimensionally-Reduced Data

ACM BELIV: DR Vis Tasks - Nov 10, 2014 Matthew Brehmer6

10 analyst interviewees, 6 domains

Human computer interaction (x3)Bioinformatics (x3)Policy analysisComputational chemistrySocial network analysisInvestigative journalism

Page 11: Visualizing Dimensionally-Reduced Data

ACM BELIV: DR Vis Tasks - Nov 10, 2014 Matthew Brehmer7

in need of a framework

Brehmer & Munzner. IEEE TVCG / Proc. InfoVis (2013).

Munzner (2014)

Why?

How?

What?

Why?

How?

What?

Why?

How?

What?

Page 12: Visualizing Dimensionally-Reduced Data

ACM BELIV: DR Vis Tasks - Nov 10, 2014 Matthew Brehmer

DR name synth. dimensions

start

DR name synth. dimensions

start

DR name synth. dimensions

start

DR name synth. dimensions

start

DR name synth. dimensions

start

8

contribution: vis task sequences for dr data

Page 13: Visualizing Dimensionally-Reduced Data

ACM BELIV: DR Vis Tasks - Nov 10, 2014 Matthew Brehmer

DR name synth. dimensions

start

DR name synth. dimensions

start

DR name synth. dimensions

start

DR name synth. dimensions

start

DR name synth. dimensions

map synth. to original

start

2 dimension-oriented

sequencesDR name synth.

dimensionsstart

DR name synth. dimensions

start

8

contribution: vis task sequences for dr data

Page 14: Visualizing Dimensionally-Reduced Data

ACM BELIV: DR Vis Tasks - Nov 10, 2014 Matthew Brehmer

DR name synth. dimensions

start

DR name synth. dimensions

start

DR name synth. dimensions

start

DR verify clusters

start

DR verify clusters

start name clusters

match clusters and classes

DR verify clusters

start name clusters

3 cluster-oriented

sequences

DR name synth. dimensions

start

DR name synth. dimensions

map synth. to original

start

2 dimension-oriented

sequencesDR name synth.

dimensionsstart

DR name synth. dimensions

start

8

contribution: vis task sequences for dr data

Page 15: Visualizing Dimensionally-Reduced Data

ACM BELIV: DR Vis Tasks - Nov 10, 2014 Matthew Brehmer9

implications for vis evaluation

“Seven Scenarios”: Lam et al. IEEE TVCG 2012.

understanding work practices evaluating visual data analysis & reasoning evaluating communication through visevaluating collaborative data analysisevaluating user performance evaluating user experience evaluating vis algorithms

Empirical Studies in Information Visualization:Seven Scenarios

Heidi Lam, Enrico Bertini, Petra Isenberg, Catherine Plaisant, and Sheelagh Carpendale

Abstract—We take a new, scenario-based look at evaluation in information visualization. Our seven scenarios, evaluating visual data

analysis and reasoning, evaluating user performance, evaluating user experience, evaluating environments and work practices,

evaluating communication through visualization, evaluating visualization algorithms, and evaluating collaborative data analysis werederived through an extensive literature review of over 800 visualization publications. These scenarios distinguish different study goals

and types of research questions and are illustrated through example studies. Through this broad survey and the distillation of thesescenarios, we make two contributions. One, we encapsulate the current practices in the information visualization research community

and, two, we provide a different approach to reaching decisions about what might be the most effective evaluation of a giveninformation visualization. Scenarios can be used to choose appropriate research questions and goals and the provided examples can

be consulted for guidance on how to design one’s own study.

Index Terms—Information visualization, evaluation.

Ç

1 INTRODUCTION

EVALUATION in information visualization is complexsince, for a thorough understanding of a tool, it not

only involves assessing the visualizations themselves, butalso the complex processes that a tool is meant tosupport. Examples of such processes are exploratory dataanalysis and reasoning, communication through visualiza-tion, or collaborative data analysis. Researchers andpractitioners in the field have long identified many ofthe challenges faced when planning, conducting, andexecuting an evaluation of a visualization tool or system[10], [41], [54], [63]. It can be daunting for evaluators toidentify the right evaluation questions to ask, to choosethe right variables to evaluate, to pick the right tasks,users, or data sets to test, and to pick appropriateevaluation methods. Literature guidelines exists that canhelp with these problems but they are almost exclusivelyfocused on methods—“structured as an enumeration ofmethods with focus on how to carry them out, withoutprescriptive advice for when to choose between them.”([54, p.1], author’s own emphasis).

This paper takes a different approach: instead offocusing on evaluation methods, we provide an in-depth

discussion of evaluation scenarios, categorized into thosefor understanding data analysis processes and those whichevaluate visualizations themselves.

The scenarios for understanding data analysis are

. Understanding environments and work practices(UWP),

. evaluating visual data analysis and reasoning(VDAR),

. evaluating communication through visualization(CTV), and

. evaluating collaborative data analysis (CDA).

The scenarios for understanding visualizations are

. Evaluating user performance (UP),

. evaluating user experience (UE), and

. evaluating visualization algorithms (VA).

Our goal is to provide an overview of different types ofevaluation scenarios and to help practitioners in setting theright evaluation goals, picking the right questions to ask,and to consider a variety of methodological alternatives toevaluation for the chosen goals and questions. Ourscenarios were derived from a systematic analysis of 850papers (361 with evaluation) from the information visuali-zation research literature (Section 5). For each evaluationscenario, we list the most common evaluation goals andoutputs, evaluation questions, and common approaches inSection 6. We illustrate each scenario with representativepublished evaluation examples from the informationvisualization community. In cases where there are gaps inour community’s evaluation approaches, we suggest ex-amples from other fields. We strive to provide a widecoverage of the methodology space in our scenarios to offera diverse set of evaluation options. Yet, the “Methods andExamples” lists in this paper are not meant to becomprehensive as our focus is on choosing among evalua-tion scenarios. Instead, we direct the interested reader

1520 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 18, NO. 9, SEPTEMBER 2012

. H. Lam is with Google, Inc, Mountain View, CA.E-mail: [email protected].

. E. Bertini is with the Department of Computer and Information Science,University of Konstanz, Box 78, Konstanz 78457, Germany.E-mail: [email protected].

. P. Isenberg is with INRIA, Universite Paris-Sud, Team Aviz, Bat 650,Saclay, Orsay Cedex 91405, France. E-mail: [email protected].

. C. Plaisant is with the University of Maryland, 2117C Hornbake SouthWing, College Park, MD 20742. E-mail: [email protected].

. S. Carpendale is with the Department of Computer Science, University ofCalgary, 2500 University Dr. NW, Calgary, AB T2N 1N4, Canada.E-mail: [email protected].

Manuscript received 8 Sept. 2010; revised 6 Nov. 2011; accepted 9 Nov. 2011;published online 30 Nov. 2011.Recommended for acceptance by C. North.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TVCG-2010-09-0224.Digital Object Identifier no. 10.1109/TVCG.2011.279.

1077-2626/12/$31.00 ! 2012 IEEE Published by the IEEE Computer Society

Page 16: Visualizing Dimensionally-Reduced Data

Matthew Brehmer

Michael Sedlmair

Stephen Ingram

Tamara Munzner

thanks: UBC InfoVis group. UBC Multimodal User Experience group

Domain-Agnostic and Data-Type-Specific Task Characterization

Tasks in Sequence, not in Isolation

Tasks Characterization for BELIV

cs.ubc.ca/labs/imager/tr/2014/DRVisTasks/

conclusion

Page 17: Visualizing Dimensionally-Reduced Data

ACM BELIV: DR Vis Tasks - Nov 10, 2014 Matthew Brehmer11

DR Vis Tasks: Supplemental

Page 18: Visualizing Dimensionally-Reduced Data

ACM BELIV: DR Vis Tasks - Nov 10, 2014 Matthew Brehmer12

implications for vis evaluation Lam et al. IEEE TVCG 2012.

Page 19: Visualizing Dimensionally-Reduced Data

ACM BELIV: DR Vis Tasks - Nov 10, 2014 Matthew Brehmer12

implications for vis evaluation

understanding work practices

pre-design requirements analysis, especially in problem-driven design studies; use task sequences as code set

Lam et al. IEEE TVCG 2012.

Page 20: Visualizing Dimensionally-Reduced Data

ACM BELIV: DR Vis Tasks - Nov 10, 2014 Matthew Brehmer12

implications for vis evaluation

understanding work practices

pre-design requirements analysis, especially in problem-driven design studies; use task sequences as code set

evaluating user performance

task sequences can inform experimental design and participant instructions when evaluating techniques combining DR + Vis

Lam et al. IEEE TVCG 2012.

Page 21: Visualizing Dimensionally-Reduced Data

ACM BELIV: DR Vis Tasks - Nov 10, 2014 Matthew Brehmer12

implications for vis evaluation

understanding work practices

pre-design requirements analysis, especially in problem-driven design studies; use task sequences as code set

evaluating user performance

task sequences can inform experimental design and participant instructions when evaluating techniques combining DR + Vis

evaluating user experience

inform participant instructions in a think-aloud evaluation

focus questionnaire questions…

Lam et al. IEEE TVCG 2012.

Page 22: Visualizing Dimensionally-Reduced Data

ACM BELIV: DR Vis Tasks - Nov 10, 2014 Matthew Brehmer12

implications for vis evaluation

understanding work practices

pre-design requirements analysis, especially in problem-driven design studies; use task sequences as code set

evaluating user performance

task sequences can inform experimental design and participant instructions when evaluating techniques combining DR + Vis

evaluating user experience

inform participant instructions in a think-aloud evaluation

focus questionnaire questions…

evaluating visual data analysis &

reasoning

analyze the use of deployed DR + Vis tools in the wild; use task sequences as code …

focus diary / interview questions

Lam et al. IEEE TVCG 2012.

Page 23: Visualizing Dimensionally-Reduced Data

ACM BELIV: DR Vis Tasks - Nov 10, 2014 Matthew Brehmer13

10 analyst interviewees, 6 domains

Human computer interaction (x3)Bioinformatics (x3)Policy analysisComputational chemistrySocial network analysisInvestigative journalism

Page 24: Visualizing Dimensionally-Reduced Data

ACM BELIV: DR Vis Tasks - Nov 10, 2014 Matthew Brehmer13

10 analyst interviewees, 6 domains

Human computer interaction (x3)Bioinformatics (x3)Policy analysisComputational chemistrySocial network analysisInvestigative journalism

Eurographics/ACM SIGGRAPH Symposium on Computer Animation (2005)K. Anjyo, P. Faloutsos (Editors)

Morphable model of quadrupeds skeletonsfor animating 3D animals

Lionel Reveret, Laurent Favreau, Christine Depraz, Marie-Paule Cani

GRAVIR, INRIA

AbstractSkeletons are at the core of 3D character animation. The goal of this work is to design a morphable model of3D skeleton for four footed animals, controlled by a few intuitive parameters. This model enables the automaticgeneration of an animation skeleton, ready for character rigging, from a few simple measurements performed onthe mesh of the quadruped to animate.Quadruped animals - usually mammals - share similar anatomical structures, but only a skilled animator can eas-ily translate them into a simple skeleton convenient for animation. Our approach for constructing the morphablemodel thus builds on the statistical learning of reference skeletons designed by an expert animator. This raises theproblems of coping with data that includes both translations and rotations, and of avoiding the accumulation oferrors due to its hierarchical structure. Our solution relies on a quaternion representation for rotations and the useof a global frame for expressing the skeleton data. We then explore the dimensionality of the space of quadrupedskeletons, which yields the extraction of three intuitive parameters for the morphable model, easily measurableon any 3D mesh of a quadruped. We evaluate our method by comparing the predicted skeletons with user-definedones on one animal example that was not included into the learning database. We finally demonstrate the usabilityof the morphable skeleton model for animation.

Categories and Subject Descriptors (according to ACM CCS): I.3.7 [Computer Graphics: Animation]:

1. Introduction

Skeleton construction and articulation placements are thefirst steps of character rigging. They involve the definitionand adjustment of numerous degrees of freedom, namely the3D position and orientation for each skeleton joint. Thesecomplex tasks are usually performed by a skilled animator.Tackling the problem in the case of virtual animals is evenmore complex than for virtual humans, since less anatomicaldata is available.

This paper shows that statistical analysis can be applied ona small set of skeleton models built by an expert animator togenerate a morphable model of quadrupeds skeletons, easilyadaptable to a wide variety of animals.

In the trade-off between fully procedural methods ver-sus data-oriented ones, morphable models have recently be-come very popular in Computer Graphics. They offer accessto high quality data through a practical parametrization thatbuilds on predictive parameters learned from statistical anal-

ysis. In [BV99], a morphable models of face 3D shapes andtexture is learned from hundreds of accurate laser scans ofhuman subjects. It offers control over intuitive parameterssuch as age, sex, mood, etc. Similarly a morphable modelof body shape has been proposed from laser scans of bodyshapes [ACP03]. Morphable models outperform simple scal-ing or FFD-like transformation by allowing to always main-tain the result within a plausible space characterized by thelearning examples.

For the first time, this paper investigates the generation ofa morphable model in the specific case of animation skele-tons. This raises the problem of using continuous interpo-lation over data that represents both rotational angles andlimbs lengths. In particular, the parameterization of 3D rota-tions may present singularities (such as gimbal lock for eu-ler angles) and is not unique (2π-periodicity), which makesits use more difficult in a statistical model. In addition, themorphable model has to take into account values definedin different units (e.g. distances and angles). Finally, to be

c⃝ The Eurographics Association 2005.

Visualization Methodology for

Multidimensional Scaling

ANDREAS BUJA 1 and DEBORAH F. SWAYNE 2

March 30, 2004

We discuss the application of interactive visualization techniques to multidimensionalscaling (MDS). MDS in its conventional batch implementations is prone to uncertaintieswith regard to (a) local minima in the underlying optimization, (b) sensitivity to thechoice of the optimization criterion, (c) artifacts in point configurations, and (d) localinadequacy of the point configurations.

These uncertainties will be addressed by the following interactive techniques: (a) algo-rithm animation, random restarts, and manual editing of configurations, (b) interactivecontrol over parameters that determine the criterion and its minimization, (c) diag-nostics for pinning down artifactual point configurations, and (d) restricting MDS tosubsets of objects and subsets of pairs of objects.

A system, called “XGvis”, which implements these techniques, is freely available withthe “XGobi” distribution. XGobi is a multivariate data visualization system that isused here for visualizing point configurations.

Key Words: Proximity Data. Multivariate Analysis. Data Visualization. InteractiveGraphics.

1 Introduction

We describe methodology for multidimensional scaling based on interactive data visualiza-tion. This methodology was enabled by software in which MDS is integrated in a multivariatedata visualization system. The software, called “XGvis”, is described in a companion paper(Buja, Swayne, Littman, Dean and Hofmann 2001), that lays out the implemented function-ality in some detail; in the current paper we focus on the use of this functionality in theanalysis of proximity data. We therefore do not dwell on the mechanics of creating certainplots; instead we deal with problems that arise in the practice of proximity analysis: issues

1Andreas Buja is Technology Consultant, AT&T Labs - Research, 180 Park Ave, P.O. Box 971, FlorhamPark, NJ 07932-0971, [email protected], http://www.research.att.com/˜andreas/.

2Deborah F. Swayne is Senior Technical Staff Member, AT&T Labs - Research, 180 Park Ave, P.O. Box971, Florham Park, NJ 07932-0971, [email protected], http://www.research.att.com/˜dfs/.

1

A Data-Driven Reflectance Model

Wojciech Matusik ∗ Hanspeter Pfister † Matt Brand† Leonard McMillan ‡

Figure 1: Renditions of materials generated using our model: steel teapot with greasy fingerprints (left), teapot with rust forming (right).Closeup pictures in the center. We used a spatially varying texture to interpolate between reflectance models for each point on the teapot.

Abstract

We present a generative model for isotropic bidirectional re-flectance distribution functions (BRDFs) based on acquired re-flectance data. Instead of using analytical reflectance models, werepresent each BRDF as a dense set of measurements. This al-lows us to interpolate and extrapolate in the space of acquiredBRDFs to create new BRDFs. We treat each acquired BRDF asa single high-dimensional vector taken from a space of all possi-ble BRDFs. We apply both linear (subspace) and non-linear (mani-fold) dimensionality reduction tools in an effort to discover a lower-dimensional representation that characterizes our measurements.We let users define perceptually meaningful parametrization direc-tions to navigate in the reduced-dimension BRDF space. On thelow-dimensional manifold, movement along these directions pro-duces novel but valid BRDFs.

Keywords: Light Reflection Models, Photometric Measurements,Reflectance, BRDF, Image-based Modeling

1 Introduction

A fundamental problem of computer graphics rendering is model-ing how light is reflected from surfaces. A class of functions called

∗MIT, Cambridge, MA.Email: [email protected]

†MERL, Cambridge, MA.Email: [pfister,brand]@merl.com

‡UNC, Chapel Hill, NC.Email: [email protected]

Bidirectional Reflectance Distribution Functions (BRDFs) charac-terizes the process where light transport occurs at an idealized sur-face point.

Traditionally, physically inspired analytic reflection models[Cook and Torrance 1982] [He et al. 1991] [He et al. 1992] providethe BRDFs used in computer graphics. These BRDF models areonly approximations of reflectance of real materials. Furthermore,most analytic reflection models are limited to describing only par-ticular subclasses of materials – a given model can represent onlythe phenomena for which it is designed. Significant efforts havebeen expended on improving these models by incorporating the rel-evant aspects of the underlying physics. Many of these models arebased on material parameters that in principle could be measured,but in practice are difficult to acquire.

An alternative to directly measuring model parameters is to ac-quire actual samples from a BRDF using some version of a gonio-spectro-reflectometer [Marschner et al. 2000] [Cornell ] [CUReT ][STARR ] [Dana 2001] [Ward 1992] and then fit the measured datato a selected analytic model using various optimization techniques[Ward 1992] [Yu et al. 1999] [Lafortune et al. 1997] [Lensch et al.2001]. There are several shortcomings to this measure-and-fit ap-proach. First, a BRDF represented by the analytic function with thecomputed parameters is only an approximation of real reflectance;measured values of the BRDF are usually not exactly equal to thevalues of the analytic model. The measure-and-fit approach is of-ten justified by assuming that there is inherent noise in the mea-surement process and that the fitting process filters out these errors.This point of view, however, ignores more significant modeling er-rors due to approximations made in the analytic surface reflectionmodel. Many of the salient and distinctive aspects of an objectsreflection properties might lie within the range of these modelingerrors. Second, the choice of the error function over which the op-timization should be performed is not obvious. For example, er-ror based on the Euclidean distance is a poor metric since it tendsto overemphasize the importance of the specular peaks (these areusually much higher than the rest) and ignore the off-specular val-ues. Finally, there is no guarantee that the optimization processwill yield the best model. Since most BRDF models are highlynon-linear, the optimization frameworks used in the fitting processrely heavily on initial guesses of the models parameters. The qual-ity of these initial guesses can have a dramatic impact on the final

23; right 36, 13, and 27); superior frontal gyrus (left!9, 31, and 45; right 17, 35, and 37).

17. Although the improvement in WM performance withcholinergic enhancement was a nonsignificant trendin the current study (P " 0.07), in a previous study(9) with a larger sample (n " 13) the effect washighly significant (P # 0.001). In the current study,we analyzed RT data for six of our seven subjectsbecause the behavioral data for one subject wereunavailable due to a computer failure. The differencein the significance of the two findings is simply aresult of the difference in sample sizes. A poweranalysis shows that the size of the RT difference andvariability in the current sample would yield a signif-icant result (P " 0.01) with a sample size of 13.During the memory trials, mean RT was 1180 msduring placebo and 1119 ms during physostigmine.During the control trials, mean RT was 735 ms duringplacebo and 709 ms during physostigmine, a differ-ence that did not approach significance (P " 0.24),suggesting that the effect of cholinergic enhance-ment on WM performance is not due to a nonspecificincrease in arousal.

18. Matched-pair t tests (two-tailed) were used to testthe significance of drug-related changes in the vol-ume of regions of interest that showed significantresponse contrasts.

19. H. Sato, Y. Hata, H. Masui, T. Tsumoto, J. Neuro-physiol. 55, 765 (1987).

20. M. E. Hasselmo, Behav. Brain Res. 67, 1 (1995).21. M. G. Baxter, A. A. Chiba, Curr. Opin. Neurobiol. 9,

178 (1999).22. B. J. Everitt, T. W. Robbins, Annu. Rev. Psychol. 48,

649 (1997).23. R. Desimone, J. Duncan, Annu. Rev. Neurosci. 18, 193

(1995).24. P. C. Murphy, A. M. Sillito, Neuroscience 40, 13

(1991).25. M. Corbetta, F. M. Miezin, S. Dobmeyer, G. L. Shul-

man, S. E. Peterson, J. Neurosci. 11, 2383 (1991).26. J. V. Haxby et al., J. Neurosci. 14, 6336 (1994).27. A. Rosier, L. Cornette, G. A. Orban, Neuropsychobiol-ogy 37, 98 (1998).

28. M. E. Hasselmo, B. P. Wyble, G. V. Wallenstein, Hip-pocampus 6, 693 (1996).

29. S. P. Mewaldt, M. M. Ghoneim, Pharmacol. Biochem.Behav. 10, 1205 (1979).

30. M. Petrides, Philos. Trans. R. Soc. London Ser. B 351,1455 (1996).

31. M. E. Hasselmo, E. Fransen, C. Dickson, A. A. Alonso,Ann. N.Y. Acad. Sci. 911, 418 (2000).

32. M. M. Mesulam, Prog. Brain Res. 109, 285 (1996).33. R. T. Bartus, R. L. Dean III, B. Beer, A. S. Lippa, Science217, 408 (1985).

34. N. Qizilbash et al., JAMA 280, 1777 (1998).35. J. V. Haxby, J. Ma. Maisog, S. M. Courtney, in Mappingand Modeling the Human Brain, P. Fox, J. Lancaster, K.Friston, Eds. ( Wiley, New York, in press).

36. We express our appreciation to S. Courtney, R. Desi-mone, Y. Jiang, S. Kastner, L. Latour, A. Martin, L.Pessoa, and L. Ungerleider for careful and criticalreview of the manuscript. We also thank M. B. Scha-piro and S. I. Rapoport for input during early stages ofthis project. This research was supported by theNational Institute on Mental Health and NationalInstitute on Aging Intramural Research Programs.

7 August 2000; accepted 15 November 2000

A Global Geometric Frameworkfor Nonlinear Dimensionality

ReductionJoshua B. Tenenbaum,1* Vin de Silva,2 John C. Langford3

Scientists working with large volumes of high-dimensional data, such as globalclimate patterns, stellar spectra, or human gene distributions, regularly con-front the problem of dimensionality reduction: finding meaningful low-dimen-sional structures hidden in their high-dimensional observations. The humanbrain confronts the same problem in everyday perception, extracting from itshigh-dimensional sensory inputs—30,000 auditory nerve fibers or 106 opticnerve fibers—a manageably small number of perceptually relevant features.Here we describe an approach to solving dimensionality reduction problemsthat uses easily measured local metric information to learn the underlyingglobal geometry of a data set. Unlike classical techniques such as principalcomponent analysis (PCA) and multidimensional scaling (MDS), our approachis capable of discovering the nonlinear degrees of freedom that underlie com-plex natural observations, such as human handwriting or images of a face underdifferent viewing conditions. In contrast to previous algorithms for nonlineardimensionality reduction, ours efficiently computes a globally optimal solution,and, for an important class of data manifolds, is guaranteed to convergeasymptotically to the true structure.

A canonical problem in dimensionality re-duction from the domain of visual perceptionis illustrated in Fig. 1A. The input consists ofmany images of a person’s face observedunder different pose and lighting conditions,in no particular order. These images can bethought of as points in a high-dimensionalvector space, with each input dimension cor-responding to the brightness of one pixel inthe image or the firing rate of one retinalganglion cell. Although the input dimension-

ality may be quite high (e.g., 4096 for these64 pixel by 64 pixel images), the perceptuallymeaningful structure of these images hasmany fewer independent degrees of freedom.Within the 4096-dimensional input space, allof the images lie on an intrinsically three-dimensional manifold, or constraint surface,that can be parameterized by two pose vari-ables plus an azimuthal lighting angle. Ourgoal is to discover, given only the unorderedhigh-dimensional inputs, low-dimensionalrepresentations such as Fig. 1A with coordi-nates that capture the intrinsic degrees offreedom of a data set. This problem is ofcentral importance not only in studies of vi-sion (1–5), but also in speech (6, 7), motorcontrol (8, 9), and a range of other physicaland biological sciences (10–12).

The classical techniques for dimensional-ity reduction, PCA and MDS, are simple toimplement, efficiently computable, and guar-anteed to discover the true structure of datalying on or near a linear subspace of thehigh-dimensional input space (13). PCAfinds a low-dimensional embedding of thedata points that best preserves their varianceas measured in the high-dimensional inputspace. Classical MDS finds an embeddingthat preserves the interpoint distances, equiv-alent to PCA when those distances are Eu-clidean. However, many data sets containessential nonlinear structures that are invisi-ble to PCA and MDS (4, 5, 11, 14). Forexample, both methods fail to detect the truedegrees of freedom of the face data set (Fig.1A), or even its intrinsic three-dimensionality(Fig. 2A).

Here we describe an approach that com-bines the major algorithmic features of PCAand MDS—computational efficiency, globaloptimality, and asymptotic convergence guar-antees—with the flexibility to learn a broadclass of nonlinear manifolds. Figure 3A illus-trates the challenge of nonlinearity with datalying on a two-dimensional “Swiss roll”: pointsfar apart on the underlying manifold, as mea-sured by their geodesic, or shortest path, dis-tances, may appear deceptively close in thehigh-dimensional input space, as measured bytheir straight-line Euclidean distance. Only thegeodesic distances reflect the true low-dimen-sional geometry of the manifold, but PCA andMDS effectively see just the Euclidean struc-ture; thus, they fail to detect the intrinsic two-dimensionality (Fig. 2B).

Our approach builds on classical MDS butseeks to preserve the intrinsic geometry of thedata, as captured in the geodesic manifolddistances between all pairs of data points. Thecrux is estimating the geodesic distance be-tween faraway points, given only input-spacedistances. For neighboring points, input-space distance provides a good approxima-

1Department of Psychology and 2Department ofMathematics, Stanford University, Stanford, CA94305, USA. 3Department of Computer Science, Car-negie Mellon University, Pittsburgh, PA 15217, USA.

*To whom correspondence should be addressed. E-mail: [email protected]

R E P O R T S

www.sciencemag.org SCIENCE VOL 290 22 DECEMBER 2000 2319

+ 4 known use cases from DR + Vis technique papers

Page 25: Visualizing Dimensionally-Reduced Data

ACM BELIV: DR Vis Tasks - Nov 10, 2014 Matthew Brehmer

discovergenerate hypotheses

browse

identify

annotate

synthesized dimensions

identified dimensions

input output

query

search

consume produce

Name Synthesized Dimensions Map Synthesized Dimension to Original Dimensions

Verify Clusters Name Clusters Match Clusters and Classes

discoververify hypotheses

locate

identify

items + original dimensions

item clusters

input output

query

search

consume discovergenerate hypotheses

browse

summarize

annotate

items in cluster cluster names

input output

query

search

consume produce

discovergenerate, verify

hypotheses

browse

compare

synthesized dim. + original dims.

mapping between synthesized & original

input output

query

search

consume

discoververify hypotheses

lookup

compare

clusters + classes

(mis)matches between clusters & classes

input output

query

search

consume

Dimensionality Reduction: Dimensional Synthesis

n original dimensions m synthesized dims. (m < n)

input output

derive

produce

14

a common lexicon for analysis

how?

what?why?

how?

what?why?

how?

what?why?

dependency

how?

what?why?

Brehmer & Munzner. IEEE TVCG / Proc. InfoVis 2013.

domain-agnostic yet data-type-specific task characterization


Recommended