New Visualizing Likelihood Density Functions by Optimal Region …fgomez/Publications... · 2016....

Visualizing Likelihood Density Functions via Optimal Region Projection

Hal Canarya,1, Russell M. Taylor IIa, Cory Quammena, Scott Prattb, Facundo A. Gomezb, Brian O’Sheab, ChristopherG. Healeyc

a Department of Computer Science, University of North Carolina at Chapel Hillb Department of Physics and Astronomy, Michigan State Universityc Department of Computer Science, North Carolina State University

Abstract

Effective visualization of high-likelihood regions of parameter space is severely hampered by the large number of pa-rameter dimensions that many models have. We present a novel technique, Optimal Percentile Region Projection, tovisualize a high-dimensional likelihood density function that enables the viewer to understand the shape of the high-likelihood region. Optimal Percentile Region Projection has three novel components: First, we select the region of highlikelihood in the high-dimensional space before projecting its shadow into a lower-dimensional projected space. Second,we analyze features on the surface of the region in the projected space to select the projection direction that shows themost interesting parameter dependencies. Finally, we use a three-dimensional projection space to show features that arenot salient in only two dimensions. The viewer can also choose sets of axes to project along to explore subsets of theparameter space, using either the original parameter axes or principal-component axes. The technique was evaluated byour domain-science collaborators, who found it to be superior to their existing workflow both when there were interestingdependencies between parameters and when there were not.

Keywords: Uncertainty, Parameter Space Analysis, Visualization, Likelihood Density Function

1. Introduction

A basic question in any field of science is how to choosethe theory that best fits the evidence. Given a set of ex-perimental observations, how does one find the model thatbest fits the data? And after choosing a model, how doesone quantify the level of confidence in that model?

This research addresses the specific case of compar-ing the explanatory power of variations on a single modelwhere those variations can be described by a list of modelparameters that can vary continuously. In statistics, theterm likelihood (L) is used to refer to the probability of aset of parameter values given a set of observations. Thelikelihood of a set of parameter values is calculated by com-paring a set of model outputs with observed quantities.

We refer to the set of possible parameter values as pa-rameter space. Because the integral of finite values withina zero-radius sphere is zero, the probability that any par-ticular point in parameter space is correct—even the loca-tion with maximum likelihood—is zero; Thus, our collab-orators are interested in identifying the shape of the high-likelihood region of parameter space, which tells them howthe parameters interact in the region of highest likelihood.

We collaborate with researchers studying galaxy for-mation and relativistic particle collisions who run largeensemble simulations to try and determine the most-likely

URL: [email protected] (Russell M. Taylor II)

parameter values for models of the fundamental behaviorof the universe.

The models under study by our collaborators have be-tween 5 and 20 parameters. For the larger models, uniformsampling of the entire parameter space with a grid fineenough to reveal important details is not feasible given thememory sizes and computational power available to them.Even if it were, direct visualization of n-D results cannotbe done on a 2D or 3D display without some sort of projec-tion. This research presents novel visualization tools thatdisplay the shape and extent of high-likelihood regions ofparameter space. These tools make features salient thatcould not be seen using previous techniques.

2. Background and Related Work

We first describe the background mathematics used byour technique to project likelihood from the high-dimensionalparameter space into a lower-dimensional visualization space,and then describe existing techniques for doing this pro-jection and display.

2.1. Likelihood Density FunctionMany problems in both the natural and social sciences

now involve large scale models characterized by numerousparameters. These models are then often compared to ex-perimental data sets, which in some cases are distilled frompeta-scale observations. This results in a high-dimensional

Preprint submitted to Computers & Graphics February 27, 2014

scalar field (one dimension per parameter) that describeshow likely it is that the parameters associated with eachpoint matches experimental results. It is this scalar fieldwhose properties we display.

Perhaps the most common method for determining theoptimal values of the set ~x of p model parameters x1 · · ·xpis using comparison to a set ~y of measurements y1 · · · yMto calculate the implausibility, χ2, as a function of ~x whereχ2(~x) ≥ 0 describes the “poorness” of the fit for a specificpoint in parameter space and is zero for a perfect fit,

χ2(~x) ≡∑a

(y

(mod)a (~x)− y(exp)

a

)2

σ2a

, (1)

where a ∈ 1..M sums over all measurements. y(mod)a refers

to the ath measurement computed by the model, and y(exp)a

to the corresponding experimental measurement. σa is ameasure of the uncertainty for comparing the model to theexperiment, and can come from uncertainties in the modelor from expected experiment measurement errors. As σaapproaches zero, the implausibility approaches infinity formodel measurements that differ at all from experiment.

If the uncertainty involved in comparing the model tothe data is normally distributed, and if there is no prior in-formation about the parameters (a flat prior distribution),Bayes theorem tells us that the likelihood that the pointin parameter space could reproduce the data is given bythe likelihood density function L:

L(~y|~x) ∼ exp{−χ

2(~x)2

}. (2)

In some cases, one is interested in only the point of min-imum χ2 (maximum likelihood), but a much more appro-priate goal is to understand the entire distribution L(~y|~x),so that one knows not only the most likely point, but un-derstands the range and distribution of likely values of ~x.

2.2. Markov chain Monte-Carlo samplingThe standard method of computing marginal probabil-

ity from a likelihood density function is to use a Markovchain Monte-Carlo (MCMC) sampling of that function.MCMC algorithms have the property that they producepoint samples whose equilibrium spatial density is propor-tional to the local likelihood density. This has proven tobe an effective way to approximate integrals over the mul-tidimensional domain [1].

This produces a large set of points in parameter space,each of which has an associated likelihood value, whosespatial density is proportional to local likelihood. Thepoint density can be used to integrate the likelihood bycounting the number of points that fall within each binin a spatial lattice; these counts are proportional to theintegrated likelihood within each bin. These points canbe projected into a lower-dimensional visualization spacebefore being binned.

Formally: let Rp be the p-dimensional space of realnumbers. Let L : Rp → R be the likelihood density func-tion over a domain with p continuous parameters (the like-lihood density at the specified coordinate). If T : Rp → Rris a projection function from the p-dimensional parameterspace to an r-dimensional subspace with p > r, the MCMCsamples can be analyzed to estimate the integral of likeli-hood within the subspace. This calculation is performedby applying T to each of the MCMC samples, and thenapproximating the density in Rr by binning the results.

For the likelihood density functions that we receivedfrom our collaborators, a few million samples were enoughto approach the equilibrium distribution with a resolutionfiner than the scale of the features of interest.

2.3. Related workThere are a number of existing dimension-reduction

techniques including projection-based methods, dimensionselection, stress-based optimization methods, multidimen-sional scaling and others [2]. Experimental comparisonamong several techniques is provided in [3]. We presentthese techniques below and show how our approach buildson and extends them to address our collaborators’ needs.

2.4. Orthographic all-point projectionsThe left side of figure 1 shows two standard projections:

histograms and the scatter-plot matrix.

x2

-2

2

x3

-2 0 2x1

-1

3

x1

-2 0 2x2

x2

likel

ihoo

d

-1 1 3

likelihood

x3

Figure 1: Visualizing the cup function as a scatterplot matrix (left)and as an iso-surface of the projected region in 3D (right).

One example of such a projection function is T (~x) =(xi). The output values of this projection are plotted in ahistogram that shows the relative likelihood of each valueof the ith parameter.

Another example of a projection function is T (~x) =(xi, xj), which will produce a scatter plot matrix when allcombinations of i , j are considered. When the numberof points is large, the scatter plots can be replaced withdensity plots to avoid over-plotting, as is done in figure 1.

A limitation of such projections that send all pointsinto the subspace, integrating the likelihood during pro-jection, is that they provide only independent statisticalinformation about the projected parameters and lose in-formation about how the parameters are related to each

2

other in the original space. To demonstrate this, considerthe H function shown in figure 2,

H(x1, x2) =

exp{−x2

1} exp{−(2πx2 exp{−x2

1})2}

if |x1| ≤√π.

0if |x1| >

√π,

a two dimensional function with a clear maximum at (0, 0)in the original 2D space. When projected onto the x1-axis, each value of the x1 parameter within the range[−√π,√π] has equal probability.

Hproj.(x1) =∫ ∞−∞

H(x1, x2)dx2

={ 1

2√π

if |x1| ≤√π

0 if |x1| >√π

Figure 2: z = H(x, y)

If one is only interested in the possible values of x1,independently of x2, then this projection will effectivelyanswer that question. However, our collaborators are alsointerested in understanding which combinations of modelparameters are most likely. That is, we are tasked withcommunicating facts about the original function in its orig-inal domain, using a representation in a projected space.

This problem can also occur in 2D projections of higher-dimensional objects. Some relationships between parame-ters are much easier to discern in 3D than in 2D projec-tions. For example, the cup function in figure 1,

C(x1, x2, x3) =exp

{− x2

1 − x22 − x2

3 − 100(x21 + x2

2 − x3)2},has a maximum region that looks like a cup or bowl. Ifwe project and plot the samples in a scatterplot matrix,we do not see the cup shape, only a crescent. Seeing thisshape requires a 3D (rather than 2D) projection.

Of course, this problem continues all the way up tothe dimension of the original space. However, with eachincrease in display dimension, the viewer can immediatelysee more complex dependencies among parameters.

These examples hint at the information lost when pro-jecting all points from high-dimensional spaces down to a2D space and displaying the scatter-plot matrix of all pairs.

After using such plots, our collaborators asked explicitlyfor help with the issue of locating “banana shapes” in highdimension caused by unexpected dependencies among pa-rameters. After several rounds of discussions and explo-ration of potential mathematical descriptions for a “ba-nana shape”, we developed the approach described here.

2.5. Other projection approachesOne possible approach is to perform n-dimensional clus-

tering to identify regions of high probability points in n-Dspace, assuming the standard issues with this type of clus-tering could be overcome (e.g. all points are far form oneanother along the majority of their dimensions, producingpoor distance discrimination). For example, recent ad-vances in multidimensional scaling (MDS) enable the pro-jection and interactive display of data sets with millionsof points, such as the ones generated by our collaborators[4]. This approach enables the visual detection of clustersin the data, keeping nearby points in the high-dimensionalspace nearby in the projection. The resulting projectionsdo not attempt to maintain relationships among individ-ual dimensions, however, so they can produce severe shapedistortions in the projected region.

Perhaps more importantly, in this project our collab-orators are not asking specifically about cluster locationsand relationships between them. Instead, they need to un-derstand the shape of the high probability regions in n-Dspace, and locating interesting shape features. We initiallydeveloped MDS-based techniques to allow our collabora-tors to study the early phase of the MCMC process asit search for equilibrium, but they were not well suited tothe analysis phase where oure collaborators are looking forrelationships among multiple dimensions.

Another possibility is the Dimstiller system [5], whichprovides flexible, interactive, multi-window displays forexploration of high-dimensional parameter spaces. Theyfound that 2D projections along principal-component di-rections produced display of structure not seen in projec-tions done using multidimensional scaling and original-axisdirections. Our approach automatically searches a broaderset of projection directions to guide the user to regions ofinterest, in addition to providing 3D display of the selectedaxes. These techniques could be readily added to Dim-stiller and other such systems as an additional workflowand display type. It is not clear how one would imple-ment high-dimensional region selection in Dimstiller, butit could be a preprocess.

Projection Pursuit [6] selects an axis of projection givenan “interestingness” function; our approach extends this toa 3D multi-index pursuit with an “interestingness” func-tion appropriate to detecting parameter dependencies.

2.6. n-Dimensional visualization techniquesWe and others have used Parallel Coordinates (PC)

techniques to display relationships among large sets of pa-rameters [7] [8]. PC places pairs of axes representing indi-vidual data attributes in spatial proximity to one another

3

[9]. A data sample is positioned at its attribute value loca-tion on each axis. Connecting these locations produces aline that visualizes the sample. Viewers can identify com-mon polylines, which correspond to clusters of data sam-ples with similar attribute values. One possibility wouldbe to select high probability points, then plot them inPC where each axis corresponds to one hyper-dimension.This could enable a viewer to identify points with commonpolylines, indicating a set of high probability points in acommon hyper-region. The detection of relationships insuch displays relies heavily on the axis ordering, however,and tracing curves across several intervening axes makesit difficult to recognize these relationships. Because PCperforms clustering, it also suffers from the fact that clus-ters do not necessarily answer our collaborators’ questionsabout shape understanding and feature detection. Indeed,we initially tried a PC-based approach. Issues our collab-orators encountered within that approach motivated us topursue the projection technique discussed in this paper.

Topology-based approaches such as Landscape Profiles[10] and Topological Spines [11] are very effective for thedisplay of the relative sizes and symmetries between dif-ferent high-likelihood regions in the high-dimensional pa-rameter space. When we implemented these approachesand ran them on our collaborators data sets, we foundthat there was a single region of high likelihood. Again,these methods to not attempt to maintain between-axisconsistency in the projection, so were not directly usableto explore the questions our collaborators had with respectto the shape of the high-likelihood region and how that in-formed parameter dependencies.

The XGobi system [12] provides a flexible, interactiveenvironment to explore high-dimensional data sets. It in-cludes the ability to explore principal-component and cus-tom mixtures of the original parameter dimensions, andit allows selection of pairs and triples of dimensions forvisualization. Our work fits into their “finding Gestalt”task; it provides an automatic way for our collaboratorsto estimate useful projection directions and augments itwith a pre-filtering of the data in high-dimensional spacethat removes irrelevant points prior to projection. Thesetechniques could be easily added to XGobi or similar sys-tems, and they would benefit from such systems’ ability toanimate the transition from one projection to another.

Glyphs are often used for multivariate visualization[13]. In a dataset with high spatial dimensionality, how-ever, even multivariate glyph approaches need to projectdata elements into the display space (e.g. onto a 2D placeor into a 3D volume). Glyphs are normally used to visu-alize multiple attribute values, and usually after a spatialembedding has been defined. For example, a common ap-proach would use properties of color, texture, and motionto visualize multiple attribute values attached to each dataelement [14, 15]. Although our data elements have onlya single likelihood attribute, if more attributes were pro-vided, a multivariate glyph approach could be consideredto represent these multiple values. This would still require

a way to project the n-D elements into 3D, however.

3. Methodology

Our method does three things beyond the standardall-points 2D projections that are common in our collab-orators’ workflow. We summarize the approach here andprovide details in the following sections.

First, we select a high-likelihood region in the orig-inal high-dimensional space prior to projection into thelower-dimensional space. This avoids the information lossincurred when projecting first and then selecting a high-likelihood region, enabling our method to display impor-tant parameter relationships in the original space.

Second, we project into 3D rather than 2D to preserveas much information about dependencies as can be effec-tively comprehended by the human visual system. Becausethe human visual system is attuned to perceiving surfacesrather than volumetric data, we designed a visualizationthat produces a surface in three dimensions. (Volume dis-play provides the ability to see inside the volume but ham-pers shape perception due to a lack of occlusion, causinginability to clearly perceive relative depths.)

Third, we select an initial projection by maximizinga metric that prefers axis sets that have more interestingdependencies between the parameters. Rather than seek-ing any particular shape (“banana”), this metric penalizessimpler shapes that can be explained purely in terms ofcovariance and selects ones whose relationships are notsimple to describe (“not an orange”). This results in thedetection of the most-interesting shape, whether it is anapple or a pear or a strawberry. The scientist is free toexplore all sets of three parameter axes, and we providean interface for them to select among them. Additionally,the axis-selection procedure can explore the space of or-thogonal linear combinations of axes, locating interestingparameter dependencies that are not present along anyaxis-aligned or principal-component projection.

3.1. Percentile Region SelectionAthough it can be used in other ways, we’ll describe our

algorithm within the context of our collaborators’ work-flow to provide a concrete example. We are provided byeach collaborator with a function F (~x) which, given apoint ~x in parameter space, computes the likelihood den-sity value, L, of the model at that point. When the sim-ulation is fast enough, it is used directly. For slower sim-ulations, a Gaussian mixture model is used as a rapidly-computable emulator.

We first apply a Metropolis–Hastings [16][17] approachto perform MCMC-based integration in the parameter space.We run MCMC integration for one million steps, each stepcalling F , to produce one million points in parameter space(each of which is annotated by its likelihood density). Thisstep takes about twenty minutes on a single processor forthe emulator used in the galaxy-formation study, but can

4

be linearly parallelized; it is a pre-process that is run oncefor each model.

We then sub-sample the points down to ten thousandfor the purpose of analysis and visualization. (We cannotsimply use the first ten thousand MCMC steps becausethe process will not have converged.) Because of the prop-erties of MCMC integration, the local density of points isproportional to the local likelihood density L. Each pointremains annotated with its actual value of L.

We first select the region of parameter space containingthe 95% of sample points with highest values (R ⊂ Rp).(The particular percentile found is a user-selectable levelthat defaults to 95%.) The pointwise labeling with L letsus select points within the 95th percentile without hav-ing to estimate local point densities in high dimensionalparameter space.

To select the points, we first find the 5% order statis-tic on likelihood, which is the likelihood threshold abovewhich 95% of the points in our sample lie. This thresholdvalue is the greatest lower bound of likelihood density func-tion values within R. After discarding the points below thethreshold, we project the remaining samples into three di-mensions using a linear orthonormal projection. Aroundthese points we compute and display a tight-fitting surfaceas described in the next section.

3.2. Finding the boundary surfaceThe Boundary portion of the PercentileRegion-

Projection algorithm can use any method that returns atight-fitting surface around Xproj, a set of points in space.We initially convolved the points with a Gaussian to pro-duce a smooth volumetric density distribution and thencomputed an iso-surface of the resulting density field. Thismethod was very computationally expensive for the largenumber of sample points and a Gaussian that is the size ofthe expected features in the data. It also depended on thespecification of a threshold value for the iso-surface thatchanges the tightness of fit. Finally, the resulting surfacedid not pass through the boundary of the points.

An alternative method that was used to produce thefigures in this paper calculates the three-dimensional De-launay triangulation of Xproj. This results in a set of tetra-hedra, the outside surface of which is the convex hull ofXproj. Because we want a tight-fitting surface rather thanthe convex hull, we remove tetrahedra whose largest edgeis longer than the scale set by the user as the smallestfeature of interest in the resulting surface. After remov-ing these tetrahedra, the surface is the boundary of theresulting simplicial complex.

PercentileRegionProjection(L, N, c, T )// L = the likelihood function to be visualized.// N = the number of samples needed to sample L.// c ∈ [0, 1] = the percentile to be visualized// T ∈ R3×p = projection matrix.

1 X = Metropolis-Hastings(L, N) // X ⊂ Rp, |X| = N2 v = FindOrderStatistic({L(x) : x ∈ X}, (1− c)N)3 X ′ = {x ∈ X : L(x) > v} // X ′ ⊂ R4 Xproj = {Tx : x ∈ X ′}5 S = Boundary(Xproj)6 return S

The parameter ranges of the model may have very dif-ferent scales. To make them visually similar and preventany dimension from being visually imperceptible, we lin-early scale the x, y, and z coordinates prior to calculatingthe boundary. There are two candidate mappings: rescal-ing the values onto the unit interval or using the standardscore (shift the mean to zero and scale by the standarddeviation). Either map makes the scale of all projecteddimensions approximately the same. We carry along theoriginal parameter values at each projected point, enablingthe viewer to query the original values and ranges.

3.3. Choosing the optimal axis-aligned projectionThe surface generated by PercentileRegionPro-

jection is the boundary of the projection into R3 of the95th percentile region in Rp. If we limit ourselves to axis-aligned orthonormal projections, there are p-choose-3 pos-sible projections. We let the user select any three axes anddisplay the resulting projection, enabling them to visuallyexplore all three-way interactions among parameters.

Our scientist collaborators expressed particular inter-est in projection directions that exhibit complex featuresthat cannot be described simply in terms of covarianceamong sample points. They expect these projections tocontain the most scientifically interesting features. Ourstatistician collaborators refer to distributions with com-plex features as “banana distributions” because a commonsuch distribution of points in two dimensions resembles abanana. The previously mentioned cup function also ex-hibits non-simple dependence.

After several incomplete attempts to positively describewhat was meant by the term “banana distribution”, wechanged our approach and instead found a metric thatmeasures the extent to which a shape is uninteresting –we then look for surfaces that are the least uninterest-ing. The most uninteresting shape is an ellipsoid, indicat-ing independent parameters, in which case the underlyingdistributions are well described by their mean value andcovariance so that no interesting dependencies between pa-rameters are present. The least interesting ellipsoid is asphere, where all of the variances match. We also wantthe metric to locate surfaces that are not homotopic toa sphere because those shapes are certainly interesting.So, rather than seeking “banana” shapes, our metric seeks“non-orange” shapes.

5

To measure how un-sphere-like a shape is, we measurethe membrane energy [18] of the surface. Because this isproportional to the surface area, we normalize by comput-ing the ratio of the square root of the surface area to thecube-root of the volume (the normalized shape index I).

I = 16√36π

√Area

3√Volume

=1.19

=1.00 =1.22

=1.09I

I I

I

Figure 3: Comparison of the normalized shape index I for four com-mon shapes.

Because the surface is generated from a discrete sam-pling, it tends to have small-scale noise on its surface thatis not a result of the underlying distribution. To removethis noise, we apply a smoothing step before calculatingthe shape index. (We don’t include this smoothing in thefinal visualization, which shows the raw underlying sur-face.) The scale of the smoothing depends on the user-specified minimum interesting feature size that was usedto cull tetrahedra above.

Our selection algorithm projects the points in each ofthe p-choose-3 different directions and measures the un-sphere-ness of the resulting surfaces. We select the leastsphere-like surface and present to the user the three pa-rameters that were used to project the points as well asthe surface and the set of projected points. The user canalso select any other sets of axes and see the interactionsamong them to further explore parameter dependencies.

OptimalSurfaceProjection(L, N, c)1 X = Metropolis-Hastings(L, N) // X ⊂ Rp2 v = FindOrderStatistic({L(x) : x ∈ X}, (1− c)N)3 X ′ = {x ∈ X : L(x) > v}4 T ′, w′ = None,−∞5 foreach T ∈ the p-choose-3 axis-aligned projections6 Xproj = {Tx : x ∈ X ′}7 Xproj = Rescale(Xproj)8 S = Boundary(Xproj)9 S = Smooth(S)

10 w = I(S)11 if w > w′

12 T ′, w′ = T,w13 S′ = Boundary(Rescale({T ′x : x ∈ X ′}))14 return S′, T ′

3.4. Optimal Non-Axis-Aligned ProjectionsIt is possible that the parameters chosen by the sci-

entists are not the fundamental parameters of the phe-nomenon. To search for these more-fundamental axes, andto more concisely represent the parameter space, our col-laborators sometimes use principal component analysis tosearch for combinations that are particularly expressive.

To enable similar searches for the most interesting de-pendencies between parameters, a variation on our tech-nique rotates the projection direction to consider manyrandomly chosen non-axis-aligned projections into R3. Theuser can specify how many such directions to sample, pro-viding the capability to explore a much broader rangeof directions than are considered in the original-axis andprincipal-component directions. This can reveal interest-ing parameter dependencies in directions that were notpreviously being investigated by our collaborators.

OptimalSurfaceProjection2(L, N, c, n)1 X = Metropolis-Hastings(L, N) // X ⊂ Rp2 v = FindOrderStatistic({L(x) : x ∈ X}, (1− c)N)3 X ′ = {x ∈ X : L(x) > v}4 X ′ = Rescale(X ′)5 T ′, w′ = None,−∞6 foreach T ∈ GenerateRandomProjections(n)7 S = Boundary({Tx : x ∈ X ′})8 S = Smooth(S)9 w = I(S)

10 if w > w′

11 T ′, w′ = T,w12 S′ = Boundary({T ′x : x ∈ X ′})13 return S′, T ′

This variation has the advantage that it may find com-binations of parameters with interesting relationships. Ithas the disadvantage that the displayed space is more dif-ficult for the viewer to interpret.

4. Applications

We implemented our approach in an open-source visu-alization toolkit, inserted it into an open-source applica-

6

tion, and tested it on real parameter-space searches fromtwo different science domains.

4.1. Implementation in VTKWe implemented OptimalSurfaceProjection and

PercentileRegionProjection as filters for the Visu-alization Toolkit (VTK) [19]. The first advantage of usingthis framework is that it is easy to integrate our algorithmsinto Visualization programs that rely on VTK, such asParaView [20]. The second advantage is that it makesavailable implementation of many useful algorithms. Weused vtkThresholdPoints for the percentile filter, vtkDe-launay3D and vtkDataSetSurfaceFilter to create a sur-face from points in space, vtkSmoothPolyDataFilter tosmooth the surface, and vtkMassProperties to calculatethe normalized shape index.

We also implemented the algorithm within an extendedVisualization Workbench that we developed [21]. Thisopen-source tool extends the ParaView visualization pro-gram and makes the technique directly available to ourcollaborators and to the broader scientific community.

Figure 4: Seconds taken to compute optimal axes for ten thousandpoints on a single processor vs. number of axes considered.

Figure 4 shows that when run on a single processor forten thousand points, the optimal-projection code takes lessthan a minute to select the optimal three dimensions fromnine. On our collaborators’ 5- and 6-D data sets, it takesless than fifteen seconds. To search PCA coordinates re-quires another 8-12 seconds, and up to five additional ran-dom projections can be tested every 8-12 seconds. Thissearch is linearly parallelizable because each projection isindependent. This turn-around time enables our collab-orators to interactively test different thresholds on theirlaptops.

Our implementation can be found packaged with ourcustom version of ParaView, the MADAI VisualizationWorkbench [21].

4.2. Application to the Galaxy Formation ModelThe first science domain tested was a model of galaxy

formation [22]. This model starts with an interaction tree

that describes how dark-matter particles combined overtime to finally form a Milky-Way-like dark-matter halo;the invisible scaffolding of the galaxy. Subsequently, ituses this tree to simulate the time evolution of the bary-onic matter that lies within different clumps of dark mat-ter. There are a number of parameters that control theevolution of the baryons in these simulations.

We list the model parameters here by name and func-tion (of meaning to the scientist, just names for the pur-pose of this paper): Zr, how massive a dark-matter halomust be soon after the big bang to form stars; fbary, themass fraction of baryons assigned to each dark-matter halo;fescp, the escape factor of metals; yfe, the amount of ironreleased in Super Novae (type II); and sfe, the specifiedstar formation efficiency.

When our collaborator looked at the surface created byPercentileRegionProjection, he found it to be moreeffective than the scatterplot matrix for visualizing andexploring his data. The first important thing he noticedwas that, as expected, the optimal axis-aligned projectionautomatically selected the parameters that show the mostcomplicated interactions. Through use of color (as seen infigure 5), he was able to visualize at once relations betweenfour parameters. For this particular problem, exposingand visualizing nonlinear coupling between parameters ofthe physical processes being considered is of key impor-tance. This is because multiple combinations of differentparameters could reproduce equally well the observationaldata. This is very important as it gives information aboutwhat physical process can be better constrained with agiven observational data set.

This goal was quickly achieved by our collaborator withoptimal projection, avoiding the burden of exploring multi-ple scatter-plot matrices that only show two parameters ata time. (Scatter-plot exploration also requires mental re-construction from ambiguous projections.) The projected95 percentile surface tightly encloses the region where thescientists’ model is likely to reproduce reasonably well theobservational data. Thus, the observed interactions be-tween parameters were meaningful to him. For example,it became instantly very clear that, while parameters Zrand fbary are strongly non-linearly coupled, parametersfbary and sfe have a more linear relationship. A secondimportant feature reported by the scientist was the easewith which he was able to explore more restrictive iso-likelihood optimally projected surfaces. Simply by modi-fying the value of the percentile for the region selection, hecould quickly explore whether the previously observed cou-plings were preserved as he selected more restrictive cuts.He reports that the insight he gained using our methodis not easily achieved by looking at the 2D scatter plots.The regions of interest and observed couplings showed thescientist where to further explore the data by running themodels, thus probing them more closely.

7

Fesc

p

0

112

Fbar

y

0

0.2

sfe

2e-11

1.8e-10

yfe

5 12 19Zr

0.04

0.2

Zr

0 56 112Fescp

Fescp

0 0.1 0.2Fbary

Fbary

2e-11 1e-10 1.8e-10sfe

sfe

likel

ihoo

d

0.04 0.12 0.2

likelihood

yfe

Figure 5: A comparison of the scatterplot matrix and Optimal Percentile Region Projection of the galaxy formation model likelihood. WithOptimal Percentile Region Projection, we visualize the complex relationship among all four parameters. The algorithm chose to project intothe Zr, fbary , sfe space, and we then colored the surface by fescp.

8

4.3. Application to the RHIC Collision ModelThe second science domain model tested was a simu-

lation of collisions between sets of gold nuclei at the Rel-ativistic Heavy Ion Collider (RHIC) [23]. The version ofthe model we tested has six parameters.

The parameters of this model (again, just names forthe purpose of this technique but with meaning to ourcollaborators) are: (dE/dy)pp, the initial energy per ra-pidity in the diffuse limit compared to measured value inpp collision; σsat, which controls how saturation sets in asa function of areal density of the target or projectile; fwn,the relative weight of the wounded-nucleon and saturationformulas for the initial energy density; Fflow, the strengthof the initial flow; η/s|Tc, the viscosity to entropy ratiofor a temperature T = 170 MeV; and α, the temperaturedependence of η/s for temperatures above 170 MeV/c.

When the RHIC model’s likelihood density functionwas plotted using a scatterplot matrix, it revealed no in-teresting pairwise relationships that lie along the initialparameter axes. When viewing these, our collaboratorswere left with a nagging doubt that perhaps there wasan undiscovered dependency lying along some other pro-jection direction. They tried to address this by runningprincipal-component analysis on the parameters and thenviewing the pairwise projections in those spaces. In thisway, they sampled two sets of linear combinations of theaxes to try and discover hidden features.

Optimal Percentile Region Projection also showed arelatively compact three-dimensional shape without inter-esting features. Because it had sampled a large space ofpotential axis combinations (the user can let the algorithmrun as long as they like), because it displays relation-ships between three axes, and because it directly showsthe most-interesting projection direction of all those thathave been found, it provided more compelling evidencethat the parameter space is well-explained by the statisti-cal correlation values.

Even when Optimal Percentile Region Projection re-veals that there are no interesting three-dimensional fea-tures, Percentile Region Projection can still be used alongwith a scalar color map to visualize the relationship be-tween four parameters at a time (see figure 6), or be-tween three parameters and a model output scalar field.Our collaborators found that these visualizations rapidlyexpress information about these relationships that two-dimensional scatterplots do not.

5. Conclusion

It is difficult to understand the potentially complex re-lationships among parameters in scientific models whenthere are many parameters. Optimal Percentile RegionProjection makes salient features of the high-likelihood re-gions of parameter space that cannot be seen using othermethods and more clearly shows that there are no inter-esting three-way relationships when there are not.

By displaying the projection of only the region of highlikelihood in the high-dimensional space, rather than theregion of high likelihood in the lower-dimensional projectedspace, we directly address statistical questions about theoriginal parameter space. By choosing the projection thatis most different from simple correlation, we save scien-tists the time and frustration of searching all p-choose-3possible projections.

This technique extends the display of parameter de-pendencies from the standard two dimensions up to threegeometric dimensions, with an overlaid fourth dimension(input or output) shown using color. It uses orthographicprojection, which avoids adding perspective or other morecomplicated distortions in the projection step.

6. Limitations and Future Work

As with all projection techniques, the presented workhides information. Future work will be needed to addressfeatures that cannot be visualized in three dimensions.The addition of layered surface textures or glyphs maybe able to extend beyond the four dimensions shown here.

The choice of displaying the boundary of the projectedregion as an opaque surface hides any interior holes in theregion when shown in 3D. Revealing these interior voidswill require the addition of cutaway slices or other tech-niques to display nested surfaces [24][25][26].

Our method is targeted at likelihood density functionsthat have a single region of high value. We focused on thiscase because the real-world examples from our scientistsbehaved in this manner. An example of a method thatcan extract topological information about multiple high-value regions of a scalar function is Topological Spines [11].Future work could combine these techniques, using Topo-logical spines to show the distribution of local maxima andour projection technique to display each local region.

7. Acknowledgments

This work was supported by the U.S. National Sci-ence Foundation’s Cyber-Enabled Discovery and Innova-tion Program through grant no. 0941373 and by SandiaNational Laboratories contract number 979573.

References

[1] Gelfand, A.E., Smith, A.F.. Sampling-based approaches to cal-culating marginal densities. Journal of the American StatisticalAssociation 1990;85(410):398–409.

[2] Engel, D., Huttenberger, L., Hamann, B.. A survey of di-mension reduction methods for high-dimensional data analysisand visualization. In: Garth, C., Middel, A., Hagen, H., ed-itors. VLUDS; vol. 27 of OASICS. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, Germany. ISBN 978-3-939897-46-0;2011, p. 135–149. URL: http://dblp.uni-trier.de/db/conf/vluds/vluds2011.html#EngelHH11.

[3] van der Maaten, L., Postma, E.O., van den Herik, H.J..Dimensionality reduction: A comparative review. 2008.

9

http://dblp.uni-trier.de/db/conf/vluds/vluds2011.html#EngelHH11

http://dblp.uni-trier.de/db/conf/vluds/vluds2011.html#EngelHH11

σsa

t

3

5W

.N./S

at. f

rac.

0

1

Init.

Flo

w

0.25

1.25

η/s

0.02

0.5

T de

p. o

f η

0.85 1.025 1.2energy norm.

0

5

energy norm.

0.85

1.2

3 4 5σsat

σsat

3

5

0 0.5 1W.N./Sat. frac.

W.N

./Sat. frac.

0

1

0.25 0.75 1.25Init. Flow

Init. Flow

0.25

1.25

0.02 0.26 0.5η/s

η/s

0.02

0.5

likel

ihoo

d

0 2.5 5

likelihood

T dep. of η

Figure 6: A comparison of the scatterplot matrix and Optimal Percentile Region Projection of the RHIC model likelihood. The OptimalPercentile Region Projection chose to project into the (dE/dy)pp, fwn, η/s space. Both the scatterplot matrix and the projections reveal thatthere are no complex relationships among model parameters, however the Optimal Percentile Region Projection is able to search a much-largerspace of linear combinations of axes to locate unexpected relationships so it provides more compelling evidence of a lack of dependencies.

10

[4] Williams, M., Munzner, T.. Steerable, progressive multidimen-sional scaling. In: INFOVIS: IEEE Symposium on InformationVisualization. 2004, p. 57–64. doi:10.1109/INFVIS.2004.60.

[5] Ingram, S., Munzner, T., Irvine, V., Tory, M., Bergner, S.,Moller, T.. Dimstiller: Workflows for dimensional analysis andreduction. In: VAST: IEEE Symposium on Visual AnalyticsScience and Technology. 2010, p. 3–10. doi:10.1109/VAST.2010.5652392.

[6] Fodor, I.. A survey of dimension reduction techniques. Tech.Rep.; 2002.

[7] David Feng Yueh Lee, L.K.R.T.. Matching visual saliency toconfidence in plots of uncertain data. TVCG 2010;16(6):980–989.

[8] Jonathan M. Harter Russell M. Taylor II, X.W.C.H.S.B.S.Z..Increasing the perceptual salience of relationships in parallelcoordinate plots. In: Proceedings of SPIE Visualization andData Analysis. 2012, p. T1–T12.

[9] Inselberg, A.. Parallel Coordinates: Visual MultidimensionalGeometry and its Applications. Springer; 2009.

[10] Oesterling, P., Heine, C., Weber, G.H., Scheuermann, G..Visualizing nd point clouds as topological landscape profiles toguide local data analysis. TVCG 2013;19(3):514–526.

[11] Correa, C., Lindstrom, P., Bremer, P.T.. Topological spines:A structure-preserving visual representation of scalar fields.TVCG 2011;17(12):1842–1851.

[12] Buja, A., Cook, D., Swayne, D.F.. Interactive high-dimensional data visualization. Journal of Computational andGraphical Statistics 1996;5:78–99.

[13] Borgo, R., Kehrer, J., Chung, D.H., Maguire, E., Laramee,R.S., Hauser, H., et al. Glyph-based visualization: Foun-dations, design guidelines, techniques and applications. In:Eurographics State of the Art Reports. EG STARs; Eurograph-ics Association; 2013, p. 39–63. URL: http://www.cg.tuwien.ac.at/research/publications/2013/borgo-2013-gly/;http://diglib.eg.org/EG/DL/conf/EG2013/stars/039-063.pdf.

[14] Healey, C.G., Enns, J.T.. Attention and visual memory invisualization and computer graphics. TVCG 2012;18(7):1170–1188.

[15] Huber, D.E., Healey, C.G.. Visualizing data with motion. In:VIS: IEEE Visualization Conference. 2005, p. 527–534.

[16] Hastings, W.K.. Monte carlo sampling methods using markovchains and their applications. Biometrika 1970;57(1):97–109.

[17] Press, W.H.. Numerical Recipes 3rd Edition: The Art of Sci-entific Computing. Cambridge University Press; 2007.

[18] Crane, K.M.. Conformal geometry processing. Ph.D. thesis;California Institute of Technology; 2013.

[19] Schroeder, W., Lorenson, B.. Visualization Toolkit: AnObject-Oriented Approach to 3-D Graphics. 1st ed.; Up-per Saddle River, NJ, USA: Prentice Hall PTR; 1996. ISBN0131998374.

[20] Squillacote, A.. The ParaView Guide: A Parallel VisualizationApplication. Kitware; 2007. ISBN 9781930934214. URL: http://www.kitware.com/products/books/paraview.html.

[21] The MADAI Collaboration, . MADAI Visualization Work-bench. http://vis.madai.us/; 2013.

[22] Gomez, F.A., Coleman-Smith, C.E., O’Shea, B.W., Tumlin-son, J., Wolpert, R.L.. Characterizing the formation historyof milky way like stellar halos with model emulators. The As-trophysical Journal 2012;760(2):112.

[23] Novak, J., Novak, K., Pratt, S., Vredevoogd, J., Coleman-Smith, C., Wolpert, R.. Determining fundamental proper-ties of matter created in ultrarelativistic heavy-ion collisions.arXiv:13035769 [nucl-th] 2013;.

[24] Interrante, V., Fuchs, H., Pizer, S.. Illlustrating transparentsurfaces with curvature-directed strokes. VIS: IEEE Visualiza-tion Conference 1996;:211–218.

[25] Weigle, C., II, R.M.T.. Visualizing intersecting surfaces withnested-surface techniques. VIS: IEEE Visualization Conference2005;:503–510.

[26] Alabi, O.S., Wu, X., Bass, S., Pratt, S., Zhong, S., Healey,C., et al. Exploring ensemble data sets through ensemble surface

slicing. Proceedings of SPIE Visualization and Data Analysis2012;8294:U1–U12.

11

http://dx.doi.org/10.1109/INFVIS.2004.60

http://dx.doi.org/10.1109/VAST.2010.5652392

http://dx.doi.org/10.1109/VAST.2010.5652392

http://www.cg.tuwien.ac.at/research/publications/2013/borgo-2013-gly/

http://www.cg.tuwien.ac.at/research/publications/2013/borgo-2013-gly/

http://www.kitware.com/products/books/paraview.html

http://www.kitware.com/products/books/paraview.html

http://vis.madai.us/

Date post:	13-Oct-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

New Visualizing Likelihood Density Functions by Optimal Region …fgomez/Publications... · 2016....

Documents