+ All Categories
Home > Documents > D ATA INING - Department of Mathematics€¦ · FORLARGEDATAEXPLORATION ... its em- bedded and ......

D ATA INING - Department of Mathematics€¦ · FORLARGEDATAEXPLORATION ... its em- bedded and ......

Date post: 26-Jul-2018
Category:
Upload: trinhdan
View: 213 times
Download: 0 times
Share this document with a friend
9
22 COMPUTING IN SCIENCE & ENGINEERING L arge-scale computational simulations of physical phenomena produce enor- mous data sets, often in the terabyte and petabyte range. Unfortunately, ad- vances in data management and visualization techniques have not kept pace with the growing size and complexity of such data sets. One para- digm for effective large-scale visualization is browsing regions containing significant features of the data set while accessing only the data needed to reconstruct these regions. To demon- strate the feasibility of this approach, we are cur- rently developing a prototype system, Evita— exploratory visualization, interrogation, and analysis. 1 The cornerstone of this visualization paradigm is a representational scheme that facil- itates ranked access to macroscopic features in the data set. We call the process of detecting those signifi- cant features feature mining, and in this article, we propose two paradigms for accomplishing this task. Our intent with both approaches is to ex- ploit the physics of the problem at hand to de- velop highly discriminating, application-depen- dent feature detection algorithms and then use available data mining algorithms to classify, clus- ter, and categorize the identified features. We have also developed a technique for denoising feature maps that exploits spatial-scale coherence and uses what we call feature-preserving wavelets. The large-data exploration methodology we de- scribe can work for any data that can be trans- formed to a multiscale representation and consists of features that can be extracted through local op- erators and aggregated in spatial, scale, and tem- poral dimensions. The examples we present in this article, however, demonstrate our feature mining approach as applied to steady computational fluid dynamics simulations on curvilinear grids. Evita The Evita system consists of three main components: an offline preprocessor, a server, and a client. The preprocessor takes the origi- nal data set and its associated grid to produce a compact representation. The compressed bit- stream resulting from this offline preprocess- ing is produced under a fixed priority schedule that permits suitable visualization of the data P HYSICS -B ASED F EATURE MINING FOR L ARGE DATA E XPLORATION One effective way of exploring large scientific data sets is a process called feature mining. The two approaches described here locate specific features through algorithms that are geared to those features’ underlying physics. DAVID S. THOMPSON, JAYA SREEVALSAN NAIR, AND SATYA SRIDHAR DUSI VENKATA Mississippi State University RAGHU K. MACHIRAJU, MING JIANG, AND GHEORGHE CRACIUN Ohio State University 1521-9615/02/$17.00 © 2002 IEEE D ATA M INING
Transcript

22 COMPUTING IN SCIENCE & ENGINEERING

Large-scale computational simulationsof physical phenomena produce enor-mous data sets, often in the terabyteand petabyte range. Unfortunately, ad-

vances in data management and visualizationtechniques have not kept pace with the growingsize and complexity of such data sets. One para-digm for effective large-scale visualization isbrowsing regions containing significant featuresof the data set while accessing only the dataneeded to reconstruct these regions. To demon-strate the feasibility of this approach, we are cur-rently developing a prototype system, Evita—exploratory visualization, interrogation, andanalysis.1 The cornerstone of this visualizationparadigm is a representational scheme that facil-itates ranked access to macroscopic features inthe data set.

We call the process of detecting those signifi-cant features feature mining, and in this article,

we propose two paradigms for accomplishing thistask. Our intent with both approaches is to ex-ploit the physics of the problem at hand to de-velop highly discriminating, application-depen-dent feature detection algorithms and then useavailable data mining algorithms to classify, clus-ter, and categorize the identified features. Wehave also developed a technique for denoisingfeature maps that exploits spatial-scale coherenceand uses what we call feature-preserving wavelets.

The large-data exploration methodology we de-scribe can work for any data that can be trans-formed to a multiscale representation and consistsof features that can be extracted through local op-erators and aggregated in spatial, scale, and tem-poral dimensions. The examples we present in thisarticle, however, demonstrate our feature miningapproach as applied to steady computational fluiddynamics simulations on curvilinear grids.

Evita

The Evita system consists of three maincomponents: an offline preprocessor, a server,and a client. The preprocessor takes the origi-nal data set and its associated grid to produce acompact representation. The compressed bit-stream resulting from this offline preprocess-ing is produced under a fixed priority schedulethat permits suitable visualization of the data

PHYSICS-BASED FEATURE MININGFOR LARGE DATA EXPLORATION

One effective way of exploring large scientific data sets is a process called feature mining.The two approaches described here locate specific features through algorithms that aregeared to those features’ underlying physics.

DAVID S. THOMPSON, JAYA SREEVALSAN NAIR, AND

SATYA SRIDHAR DUSI VENKATA

Mississippi State UniversityRAGHU K. MACHIRAJU, MING JIANG, AND GHEORGHE CRACIUN

Ohio State University

1521-9615/02/$17.00 © 2002 IEEE

D A T AM I N I N G

JULY/AUGUST 2002 23

set. The preprocessor organizes the bitstreamin terms of regions of interest. Each ROI is a cor-related spatiotemporal region that contains aphysical feature with an associated ranking.Thus, the preprocessor is essentially a featuremining application that produces a significancemap delineating the ROIs.

When an Evita user begins data explorationthrough the client system, the server initiallytransmits a background. Then, according to afeature-based priority schedule, it transmits in-formation one feature at a time. Features appearon the client system according to the priorityschedule and are incrementally refined overtime, resulting in a monotonic improvement inthe image quality. For each user-initiated changein the viewpoint, the server generates a new pri-ority schedule and transmits new ROIs. Whilevisualization is under way, the server acceptsreal-time information from the client and dy-namically reorganizes the bitstream to producethe desired priority ranking.

The final component of the system, the client,decodes the reorganized bitstream arriving fromthe server and produces the visualization. Theclient can again extract features to gain furtherinsights into the data.

The preprocessing stage requires the entiredata set. Given the local nature of the waveletand feature operators, the data can be accessedby blocks or piecemeal. The system can also useout-of-core and parallel execution to reduce ex-cessive memory buffering of data during pre-processing. However, for the actual client-dri-ven exploration, buffering the entire data set isnot necessary. Only the data demarcated by theview frustum is necessary; features can be ex-tracted from that. Thus, exploration does notrequire a large buffer space.

Several aspects of the Evita project have gen-erated innovative research—for example, its em-bedded and progressive encoding, server-clientarchitecture, and visualization interactors. In thisarticle, however, we will focus exclusively on howEvita handles feature mining.

What is a feature?

Perhaps the most appropriate response to thisquestion is, “It depends on what you’re lookingfor.” In general, a feature is a pattern occurringin a data set that is of interest and that manifestscorrelation relationships between various com-ponents of the data. For instance, a shock in asupersonic fluid flow would be considered a sig-

nificant feature: when a shock occurs, the pres-sure increases abruptly in the direction of theflow, and the fluid velocity decreases in a pre-scribed manner. A significant feature also hasspatial and temporal scale coherence. In mostcases, an adequately resolved feature spans sev-eral discrete spatial or temporal increments.

For many applications, generic data miningtechniques such as clustering, association, andsequencing can reveal statistical correlations be-tween various components of the data.2 Return-ing to the shock example, we could use statisticalmining to ferret out associations, but it might bedifficult to attach precise spatial associations forthe rules discovered. A fluid dynamicist, how-ever, would like to locate features with a ratherhigh degree of certainty. Such qualitative asser-tions alone will not suffice.

This is where our approach to feature miningcomes in: we take advantage of the fact that, forsimulations of physical phenomena, the fieldvariables satisfy certain physical laws. We can ex-ploit these kinematic and dynamic considera-tions to locate features of interest. The resultingfeature detection algorithms, by their very na-ture, are highly application-specific. However,the fidelity improvements garnered by tailoringthese highly discriminating feature detection al-

Other Feature-Mining WorkIn many ways, our approach and framework parallel those of

Kenneth Yip and Feng Zhao.1 Spatial aggregation is the cornerstoneof their approach. All points belonging to an identified region areaggregated to form a subdomain or a region of interest. Theypropose frameworks that facilitate imagistic reasoning and allowconstruction of frameworks for imagistic solvers. In our frame-work, features are extracted as simplicial entities (for example,points, lines, regions); we rely on the visceral potency of visualiza-tion algorithms to gain insights.

Yip and Zhao use aggregation and task-specific classification tocreate an explicit neighborhood relationship graph. Their approachuses redescription to conduct aggregation at higher levels of ab-straction. On the other hand, we exploit aggregation, classifica-tion, and other tasks (denoising, tracking, and so on) to facilitateexploration of large data.

Furthermore, many of the examples described by Yip and Zhaoare for 2D scalar fields; we’ve chosen our examples from morecomplex flow problems.

Reference1. K. Yip and F. Zhao, “Spatial Aggregation: Theory and Applications,” J. Artificial

Intelligence Research, vol. 5, Aug. 1996, pp. 1–26.

24 COMPUTING IN SCIENCE & ENGINEERING

gorithms to the particular application far out-weigh any loss of generality. In this context, fea-ture detection is, in essence, a data mining task.

The state of the art in feature detection andmining in simulation data is similar to what ex-isted for image processing when edge detectionmethods were the main techniques. Much moreis now understood, and mining in images is oftendone in terms of the features, namely edges.This suggests that a blend of data and featuremining methods might have the potential to re-duce the burdensome chore of finding featuresin large data sets.

Feature detection algorithms

In this section, we describe two distinct fea-ture detection paradigms. The common threadis that both are bottom-up feature constructionswith underlying physically based criteria. Thetwo perform essentially the same steps, but indifferent order. As will become evident, it is un-likely that non-physics-based techniques wouldprovide the fidelity needed to locate complexflow field structures.

The feature we’ll focus on is the vortex. (Foran excellent review of vortex detection tech-niques, see Martin Roth.3) We all have an intu-itive, informal understanding of what a vortexis—a swirling flow pattern around a centralpoint. However, the primary challenge associ-ated with vortex detection is that there is no cleardefinition of a vortex. Here is one of the litera-ture’s clearest:

A vortex exists when instantaneous streamlinesmapped onto a plane normal to the vortex coreexhibit a roughly circular or spiral pattern, whenviewed from a reference frame moving with thecenter of the vortex.4

Unfortunately, this definition is self-referen-tial—you have to know certain properties abouta particular vortex to be able to detect it. Forsteady 3D flow, the vortex’s orientation is, ingeneral, unknown; for unsteady flows, the ve-locity of the “reference frame moving with thecenter of the vortex” is likewise unknown. Theunderlying difficulty with this definition lies inits global nature. To use this definition, you mustknow the orientation and velocity of a planarsurface moving through space. Furthermore,you must be able to deduce streamline patterns.

Now we’ll show how to apply our two differ-ent feature detection paradigms to vortical flows.

Point classification techniquesThe first feature detection paradigm, which

we call point classification, requires several oper-ations in sequence:

1. Detection by application of a local sensor ateach point in the domain

2. Binary classification (verification) of pointsbased on some criteria

3. Aggregation of contiguous regions of like-classified points

4. Denoising to eliminate aggregates that areof insufficient extent, strength, and so on

5. Ranking based on feature saliency

This approach identifies individual points as be-longing to a feature and then aggregates them toidentify regions that are features. The points areobtained from a tour of the discrete domain andcan be in many cases the vertices of a physicalgrid. The sensor used in the detection phase andthe criteria used in the classification phase arephysically based point-wise characteristics of the fea-ture of interest. (We could also track these regionsin the temporal dimension, but in this article we’llrestrict our attention to feature extraction.)

Now let’s look at a vortex detection techniquethat uses the point classification approach. Thistechnique uses the eigenvalues of the local ve-locity gradient tensor.5 Under a limited set ofconditions, swirling flow is characterized by re-gions where the eigenvalues of the velocity gra-dient tensor are complex. Carl Berdahl andDavid Thompson defined a swirl parameter thatestimates “the tendency for the fluid to swirlabout a given point.”5 The value of this parame-ter is

,

where Im(λ1,2) is the imaginary part of the com-plex conjugate pair of eigenvalues, Vconv is the ve-locity in the plane whose normal is the realeigenvector, and L is the characteristic length ofthe swirling region. The swirl has a nonzerovalue in regions containing vortices and attains alocal maximum in the core region.

In this point classification algorithm, the de-tection step consists of computing the eigenval-ues of the velocity gradient tensor at each fieldpoint. The classification step consists of check-ing for complex eigenvalues and assigning a swirlvalue if they exist. The aggregation step then de-

τλ

=( )Im ,1 2 L

Vconv

JULY/AUGUST 2002 25

fines the region containing the vortex.This method’s primary shortcoming is that

it—and all eigenvalue-based vortex detectiontechniques—can generate false positives. Its lo-cal nature makes it unable to discriminate be-tween locally curved streamlines and closedstreamlines characteristic of a vortex. Other fea-tures, such as shocks, are more amenable to thepoint classification framework.

Aggregate classification techniquesWe can best incorporate the global informa-

tion needed to define a vortex into our secondfeature detection paradigm, the aggregate classifi-cation approach. Aggregate classification followsa somewhat different sequence of operations:

1. Detection by application of a local sensor ateach point in the domain

2. Aggregation of contiguous regions of prob-able candidate points

3. Binary classification (verification) of each ag-gregate based on some criteria

4. Denoising to eliminate aggregates that areof insufficient extent, strength, and so on

5. Ranking based on feature saliency

This approach identifies individual points asbeing probable candidate points in a feature andthen aggregates them. The classification algo-rithm is applied to the aggregate using physicallybased regional criteria to determine whether thecandidate points constitute a feature.

We have recently developed an aggregate clas-sification vortex detection technique.6 We basedits detection step on an idea derived from alemma in combinatorial topology: Sperner’slemma states, “Every properly labeled subdivi-sion of a simplex σ has an odd number of distin-guished simplices.”7 In other words, given a con-vex set in n dimensions, triangulate it into

subtriangles and assign to each vertex of the sub-triangles a label from 1, 2, ..., n + 1. If the initialvertices of the convex set are completely labeled,then there exist an odd number of completely la-beled subtriangles within the convex set. A sub-triangle is completely labeled if it receives all n + 1 labels.

The idea behind Sperner’s lemma is to deducethe properties of a triangulation based solely onthe labeling of the vertices. In a dual fashion, thisapproach deduces the behavior of a vector fieldbased solely on the labeling of the velocity vec-tors. In particular, velocity vectors around coreregions exhibit certain flow patterns unique tovortices, and it is precisely these flow patternsthat we search for in the computational grid.Not surprisingly, our approach is related to crit-ical point theory. A critical point is a point atwhich the velocity is zero—that is, where the lo-cal slope of the streamline is undefined. How-ever, critical points alone are not sufficient to de-tect a vortex.

For each grid point, our algorithm examinesits four immediate neighbors in 2D—six in3D—to see whether the neighboring velocityvectors point in three or more direction ranges(that is, to see if they form a complete triangle).3D vortex core regions are much more difficultto detect than their 2D counterparts: to do so,we must identify a core direction (the normal toa plane) and apply our 2D algorithm to theneighboring vectors projected onto that plane.We call this plane the swirl plane because instan-taneous streamlines projected onto it exhibit aswirling pattern.

Figure 1a shows the 2D algorithm on a 2Dstructured grid with the detected core regioncolored gray; Figure 1b shows the 3D algorithm,along with the swirl plane, and the completetetrahedron A, B, C, E. Potential candidates forthe core direction vector include the vorticity

(b)(a)

j + 1

i + 1ii – 1

i + 1ii – 1j – 1

jk + 1

k – 1k

j + 1

j – 1

j

A

A

B BB

Swirlplane

C

CC

A

B

E

C

A

Figure 1. Topology-based vortex core region detection algorithm: (a)2D algorithm and(b) 3D algorithm.

26 COMPUTING IN SCIENCE & ENGINEERING

vector and the eigenvector corresponding to thereal eigenvalue of the velocity gradient tensor.

The final point to consider in the detectionstep is the issue of direction quantization. Direc-tion quantization refers to the number of possi-ble direction ranges in which a vector canpoint—that is, the number of possible labels avector can receive. Given a continuous vectorfield defined on a discrete 2D grid, it is not al-ways sufficient to use only three direction rangesto label the vectors. For the cases considered todate, four direction ranges have proved suffi-cient. An added benefit of direction quantizationis that it makes the 3D algorithm relatively in-sensitive to the core direction, and approximatecore directions can be just as effective as exactcore directions.

Our technique segments candidate core re-gions by aggregating points identified from thedetection phase. We then classify (or verify)these candidate core regions based on the exis-tence of swirling streamlines surrounding them.(For features that lack a formal definition, suchas the vortex, we must choose the verificationcriteria so that it concurs with the intuitive un-derstanding of the feature. In this case, verifyingwhether a candidate core region is a vortex coreregion requires checking for any swirling stream-lines surrounding it.) Checking for swirlingstreamlines is a global (or aggregate) approachto feature classification (or verification) becauseswirling is measured with respect to the core re-gion, not just individual points within the coreregion.

In two dimensions, checking for swirlingstreamlines is fairly straightforward. Using oper-ators from differential geometry, we can measureswirling by computing a streamline’s winding an-gle with respect to a candidate core region. The

winding angle is a measure of the total curvature,or the signed angle of rotation, of a planar curvewith respect to a reference point. Therefore, awinding angle of 2π means that the planar stream-line has completely swirled around the candidatecore region, making it the natural choice for theclassification criteria in two dimensions.

In three dimensions, however, checking forswirling streamlines is much more difficult, be-cause we cannot extend the winding angle oper-ator into higher dimensions, and vortices inthree dimensions can exhibit geometries thatbend or twist in various different ways. To ad-dress these issues, our verification algorithmcomputes the tangent vector and probe vectorfor each point along the streamline. The probevector is oriented to point at or near the core re-gion, and it retrieves the core direction vectorfrom that location. The algorithm locally alignsthe retrieved core direction vector with the z-axis and then applies the same transformation tothe tangent vector. Essentially, the purpose ofthis alignment step is to locally straighten anycurved vortices along the z-axis. The algorithmthen projects the transformed tangent vectoronto the (x, y)-plane; therefore, if the streamlineis swirling, the projected tangent vectors makea complete revolution in the (x, y)-plane. Thus,the classification criterion in three dimensionsis a signed angle of rotation of 2π, in the (x, y)-plane, by the projected tangent vectors.

ExamplesNow we’ll demonstrate the two techniques

we’ve discussed by applying them to a compli-cated, delta-wing flow field that has undergonevortex burst—a condition characterized by therapid expansion of the vortex. Figure 2 showstwo views of the isosurfaces of the point classi-

Figure 2. Point-based vortexdetection algorithm applied toa delta wing with vortex burst.

JULY/AUGUST 2002 27

fication–based swirl parameter, with the complexstructures of the burst vortex clearly evident.

Figures 3 and 4 show results generated usingthe aggregate classification technique. Figure 3ashows the candidate core regions. The yellowaggregated regions are those identified by theclassification step as being actual cores; the greenaggregated regions failed the verification step.Figure 3b shows the verified core regions en-closed in swirling streamlines. Figure 4 illus-trates the verification technique. The cyan vec-tors in the upper image are the streamlinetangents, and the orange vectors are the probevectors. The probe vectors interrogate the coreregion for the local core direction vector. Thebottom image shows that the projected tangentvectors satisfy the 2π swirling criterion.

Wavelet-based denoising

Once we have obtained a feature map, the nextstep is to filter and rank the ROIs systematically.The visualization process should not accord sig-nificance to features that are weak or of small spa-tial extent. In addition, certain types of featuresrequire several grid points for the simulation toadequately resolve them; the visualization shouldalso ignore features that don’t meet this criterion.Our basic strategy, which eliminates all these in-significant features, is to accord ROI status onlyto features that persist over several spatial scales.In other words, we exploit the scale coherence ofsignificant features to determine whether a de-tected feature is, in fact, an ROI.

Once we have denoised the feature map, wecan rank the remaining ROIs according to ap-propriate criteria—for example, size, strength,average strength, and so on. A scale space gen-

erated by the wavelet transform is even more at-tractive. Features of small spatial extent, possi-bly noise, populate the finer scales. However,true features populate several scales. Because ourdata exploration system employs the wavelettransform, using scale-based denoising was anatural choice. Another study, which used masksderived from the swirl operator to ascribe asaliency, showed that scale space denoising ismore powerful than spatial methods that employ

Figure 3. Aggregate-based vortex detection algorithm applied to a delta wing withvortex burst: (a) yellowcandidate core regions areactual, green are false; and (b)verified core regions.

Figure 4. Verificationprocess forprimary vortex, aggregateclassificationtechnique.

(a) (b)

28 COMPUTING IN SCIENCE & ENGINEERING

size or value as a criterion for rejecting regions asfeature-poor.8 Thus, we rank regions by mea-suring the persistence of features in the discretescale space.

Multiscale representationTo obtain a multiscale representation of the

feature map, we apply a discrete wavelet trans-formation (DWT) to the grid and the field data.Because we are interested in the presence of fea-tures in a multiscale representation of the data,we do not apply the wavelet transform directlyto the feature maps but, instead, to the field dataitself. We then apply the feature detection algo-rithm at each scale to generate scale-dependentbinary maps. Then, we combine the binary mapsat each scale to generate a single denoised binarymap. Because many feature detection algorithmsare based on gradients of the field variables, it iscritical that the wavelet transform not introducespurious features.

Although denoising the data might seem to beindependent of the application, we contend that thedenoising procedure itself should preserve certainfeature characteristics. For generality, we use phys-ical characteristics including position, shape, andstrength (or geometrical characteristics) instead ofthe features’ dynamical properties. Our objective isto employ what we term feature-preservingwavelets to generate a multiscale representation ofthe data. We consider a wavelet feature-preservingif it does not distort geometrical characteristics.Many commonly used wavelet functions do not ex-hibit favorable feature preservation characteristics.For example, deriving a coarser version of a func-tion can introduce new extrema that can lead to thedetection of spurious features.

Elsewhere, we’ve outlined the design of a fam-ily of functions that satisfy certain feature preser-vation properties in a multiscale setting.9 We im-plement these functions as the low-pass analysisfilter in a two-channel filter bank. We use a fac-torization method to determine the other com-ponents of the filter bank. Because our focushere is feature mining, we will not elaborate fur-ther. However, we do stress that it is importantto choose appropriate transforms for featuremining applications. The two wavelets we chosefor this application were the linear liftingscheme10 and a newly developed lifting imple-mentation of a feature-preserving total variationdiminishing (TVD) wavelet.9

Significance map generationWe’ve already obtained the significance or fea-

ture map from either the point classification orthe aggregate classification feature mining algo-rithm. Now we convert this map to a binaryform, with a 1 signifying the presence of a fea-ture at a grid point.

The multiscale procedure for generating a ro-bust, denoised significance map is as follows. Weupdate the map at the finer scale j using the mapat the lower, or coarser, scale j – 1. Because ofthe 2D wavelet transformation’s dyadic nature,each grid point (l, k) in scale j – 1 corresponds topoint (2l, 2k) in scale j. Therefore, if a cell in thecoarser scale j – 1 is defined by grid points (l, k),(l + 1, k), (l + 1, k + 1), and (l, k + 1), the corre-sponding cell in the finer scale j is defined by (2l,2k), (2l + 2, 2k), (2l + 2, 2k + 2), and (2l, 2k + 2).

The rules for updating scale j using scale j – 1are as follows:

• If the map at any of the grid points in scalej – 1 is a 0, make the map at the correspondinggrid point in scale j a 0.

• If both the grid points on an edge of the cellin scale j – 1 are 0s, make the map at the mid-point on the corresponding edge in scale j a 0.

• If all four grid points in a cell are 0s in scale j – 1, make the map 0 at the midpoint of thecorresponding cell in j.

Apart from these changes, the rest of the gridpoints on the map remain unmodified. Thus,features that did not percolate down to the lowerscale are marked with 0s. Because we have addedno 1s, we haven’t created any new features. Weapply this procedure recursively over two ormore scales—that is, we use a denoised map at alower scale to denoise the higher scale.

ExamplesWe implemented and tested a 2D version of

this approach using a section of the Naval Lay-ered Ocean Model Pacific Ocean data set.11 Fig-ure 5a shows the original binary swirl map. Fig-ure 5b shows the swirl map obtained afterdenoising using the linear lifting wavelet; Fig-ure 5c shows that obtained after denoising usinga feature-preserving TVD wavelet.

Although it is evident that both filters removepixels from the maps, visual inspection does notprovide a clear indication of the denoising algo-rithms’ relative merits. Of 339 original features,feature-preserving wavelet denoising removed82 features in one level of transform, 116 fea-tures in two levels, and 158 features in three lev-els. On the other hand, the linear lifting wavelet

JULY/AUGUST 2002 29

eliminated 67, 105, and 150 features for one,two, and three levels of transform.

Obviously, the feature-preserving waveleteliminated more features than the linear liftingwavelet. What is interesting, however, is themanner in which elimination occurred. Figures6a and 6b show the distribution of features in theoriginal data and after one, two, and three lev-els of denoising for the linear and feature-pre-serving wavelets. We categorized features usingtheir average swirl (using log10τ). In general, bothwavelets do a good job of eliminating weakerfeatures (τ < –2.5). However, even though thefeature-preserving wavelet eliminates more fea-tures, it preserves more of the strongest featuresas measured by average swirl (τ > –1.5). Our in-terpretation is that the feature-preservingwavelets do a better job of preserving the signif-icant features. We can now rank the remainingfeatures using a criterion such as average swirl.

The basis of our efforts to improve fea-ture mining algorithms is the asser-tion that, for physics-based simula-tions of complex phenomena, we

should exploit the inherent relationships be-tween the various components of the data. Tra-ditional data mining algorithms alone cannotguarantee success. In cases where we understandthe underlying physics, at least at some basiclevel, it makes sense to exploit the known corre-lations whenever possible. Taken together, ap-plication-specific feature detection algorithms

and application-independent techniques fromtraditional data mining provide an arsenal thatoffers much promise in solving the problem ofeffective exploration of large data sets.

(b)

(a)

160

140

120

100

80

60

40

20

0τ > –1.5 –2.0 < τ < –1.5 –.5 < τ < –2.0 –3.0 < τ < –2.5 τ < –3.0

τ > –1.5 –2.0 < τ < –1.5 –.5 < τ < –2.0 –3.0 < τ < –2.5 τ < –3.0

Num

ber

of fe

atur

es

Average swirl

Original data

Denoised data—1 level

Denoised data—2 levels

Denoised data—3 levels

Original data

Denoised data—1 level

Denoised data—2 levels

Denoised data—3 levels

160

140

120

100

80

60

40

20

0

Num

ber

of fe

atur

es

Average swirl

Figure 6. Denoising effectiveness based on average swirl: (a) linearlifting wavelet and (b) feature-preserving total variation diminishingwavelet.

Figure 5. Wavelet-based denoising: (a) original binary swirl map, (b) map after denoising with the linearlifting wavelet, and (c) map after total variation diminishing denoising.

250

200

150

100

50

250

200

150

100

50

250

200

150

100

50

50 250200150100 50 250200150100 50 250200150100

(a) (b) (c)

30 COMPUTING IN SCIENCE & ENGINEERING

AcknowledgmentsWe gratefully acknowledge the valuable contributions ofour collaborators on the Evita project—James E. Fowlerand Bharat Soni of Mississippi State University and WillSchroeder of Rennselaer Polytechnic Institute. This workis partially funded by the NSF under the Large Data andScientific Software Visualization Program (ACI-9982344),the Information Technology Research Program (ACS-0085969), and an Early Career Award (ACI-9734483).Additional support was provided by a grant from the ArmyResearch Office (DAA D19-00-1-0155). We also thankthe anonymous reviewers for many useful suggestions.

References

1. R. Machiraju et al., “EVITA—Efficient Visualization and Interro-gation of Tera-scale Data,” Data Mining for Scientific and Eng. Ap-plications, R. Grossman et al., eds., Kluwer Academic Publishers,Norwell, Mass., 2001, pp. 257–279.

2. J. Han and M. Kamber, Data Mining, Morgan Kaufmann, SanFrancisco, 2001.

3. M. Roth, Automatic Extraction of Vortex Core Lines and Other Line-Type Features for Scientific Visualization, doctoral thesis, Swiss Fed-eral Institute of Technology, Dept. Computer Science, Zurich,2000.

4. S.K. Robinson, “Coherent Motions in the Turbulent BoundaryLayer,” Ann. Rev. Fluid Mechanics, vol. 23, 1991, pp. 601–639.

5. C. Berdahl and D. Thompson, “Education of Swirling Structureusing the Velocity Gradient Tensor,” American Inst. of Aeronauticsand Astronautics J., vol. 31, no. 1, Jan. 1993, pp. 97–103.

6. M. Jiang, R. Machiraju, and D. Thompson, “Geometric Verifica-tion of Features in Flow Fields,” submitted for publication, Proc.IEEE Visualization 2002, IEEE CS Press, Los Alamitos, Calif., 2002.

7. M. Henle, A Combinatorial Introduction to Topology, Dover, NewYork, 1979.

8. B. Nakshatrala, R. Machiraju, and D. Thompson, “Ranked Rep-resentation of Vector Fields,” Proc. Dagstuhl 2000 Seminar on Sci-entific Visualization, Kluwer Academic Publishers, Norwell, Mass.(to appear).

9. G. Craciun et al., “A Framework for Filter Design EmphasizingMultiscale Feature Preservation,” Proc. Army High-PerformanceComputing Research Center and Center for Applied Scientific Com-putation/Lawrence Livermore National Laboratory 3rd Workshopon Mining Scientific Data Sets, Soc. for Industrial and AppliedMathematics, Philadelphia, 2001, pp. 105–111.

10. W. Sweldens, “The Lifting Scheme: A Construction of SecondGeneration Wavelets,” SIAM J. Mathematical Analysis, vol. 2, no.29, 1997, pp. 511–546.

11. A.J. Wallcraft, The Naval Layered Ocean Model Users Guide, tech.report 35, Naval Oceanographic and Atmospheric Research Lab-oratory, Bay St. Louis, Miss., 1999.

David S. Thompson is an associate research professorat the Center for Computational Systems at MississippiState University. His research interests include featuremining in large data sets, mesh generation, and air-craft icing. He received a BS and an MS in aerospaceengineering from Mississippi State and a PhD in aero-

space engineering from Iowa State University. He is amember of the IEEE. Contact him at Box 9627, Missis-sippi State Univ., MS 39762-9627; [email protected].

Raghu K. Machiraju is an assistant professor of com-puter and information science at Ohio State University.His research interests include large data visualizationand analysis, modeling, and image synthesis. He re-ceived his PhD from Ohio State Univ. Contact himat 395 Dreese Labs, 2015 Neil Avenue Mall, OhioState Univ., Columbus, OH 43210; [email protected].

Ming Jiang is a PhD candidate in computer science atOhio State University. His research interests include fea-ture detection, flow visualization, and computer graph-ics. He received his BS in computer science from OhioState University. Contact him at 395 Dreese Labs, 2015Neil Avenue Mall, Ohio State Univ., Columbus, OH43210; [email protected].

Jaya Sreevalsan Nair is a graduate student completingher MS in the computational engineering program atMississippi State University. She plans to continue her ed-ucation in the PhD program in computer science at theUniversity of California, Davis, in the Fall. Her researchinterests include wavelets and visualization. She receiveda BS in aerospace engineering from the Indian Institute ofTechnology at Madras. Contact her at Box 9627, Missis-sippi State Univ., MS 39762-9627; [email protected].

Gheorghe Craciun is a PhD candidate in the mathe-matics department at Ohio State University. He is in-terested in dynamical systems, wavelet design, andchemical and biological applications. He has a BS inmathematics from Bucharest University and an MS inmathematics and an MS in computer science fromOhio State University. He is a member of the AmericanMathematical Society. Contact him at the Dept. ofMathematics, Ohio State Univ., Columbus, OH 43210;[email protected].

Satya Sridhar Dusi Venkata recently completed hisMS in computational engineering at Mississippi StateUniversity. His research interests include mesh genera-tion and feature detection. He received a BS in me-chanical engineering from the Indian Institute of Tech-nology at Madras. Contact him at 89 BeechwoodPlace, Plano, TX 75075; [email protected].

For more information on this or any other computingtopic, please visit our digital library at http://computer.org/publications/dlib.


Recommended