Date post: | 11-Jan-2016 |
Category: |
Documents |
Upload: | eugene-jefferson |
View: | 218 times |
Download: | 2 times |
NERCOMP Workshop, Dec. 2, 2008
Information Visualization: the Other Half of Data Analysis
Dr. Matthew WardComputer Science DepartmentWorcester Polytechnic Institute
NERCOMP Workshop, Dec. 2, 2008
A Data Analysis PipelineRaw Data
Processed Data
HypothesesModels Results
Cleaning Filtering
Transforming
Statistical Analysis Pattern Rec
Knowledge Disc
Validation
A CB
D
NERCOMP Workshop, Dec. 2, 2008
Where Does Visualization Come In?
All stages can benefit from visualization A: identify bad data, select subsets, help
choose transforms (exploratory) B: help choose computational techniques,
set parameters, use vision to recognize, isolate, classify patterns (exploratory)
C: Superimpose derived models on data (confirmatory)
D: Present results (presentation)
NERCOMP Workshop, Dec. 2, 2008
What do we need to know to do Information Visualization? Characteristics of data
Types, size, structure Semantics, completeness, accuracy
Characteristics of user Perceptual and cognitive abilities Knowledge of domain, data, tasks, tools
Characteristics of graphical mappings What are possibilities Which convey data effectively and efficiently
Characteristics of interactions Which support the tasks best Which are easy to learn, use, remember
NERCOMP Workshop, Dec. 2, 2008
Issues Regarding Data Type may indicate which graphical mappings are
appropriate Nominal vs. ordinal Discrete vs. continuous Ordered vs. unordered Univariate vs. multivariate Scalar vs. vector vs. tensor Static vs. dynamic Values vs. relations
Trade-offs between size and accuracy needs Different orders/structures can reveal different
features/patterns
NERCOMP Workshop, Dec. 2, 2008
Issues Regarding Users What graphical attributes do we perceive
accurately? What graphical attributes do we perceive
quickly? Which combinations of attributes are
separable? Coping with change blindness How can visuals support the development
of accurate mental models of the data? Relative vs. absolute judgements – impact
on tasks
NERCOMP Workshop, Dec. 2, 2008
Issues Regarding Mappings
Variables include shape, size, orientation, color, texture, opacity, position, motion….
Some of these have an order, others don’t
Some use up significant screen space Sensitivity to occlusion Domain customs/expectations
NERCOMP Workshop, Dec. 2, 2008
www3.sympatico.ca/blevis/Image10.gif
NERCOMP Workshop, Dec. 2, 2008
Issues Regarding Interactions Interaction critical component Many categories of techniques
Navigation, selection, filtering, reconfiguring, encoding, connecting, and combinations of above
Many “spaces” in which interactions can be applied Screen/pixels, data, data structures,
graphical objects, graphical attributes, visualization structures
NERCOMP Workshop, Dec. 2, 2008
Importance of Evaluation Easy to design bad visualizations Many design rules exist – many conflict, many routinely
violated 5 E’s of evaluation: effective, efficient, engaging, error
tolerant, easy to learn Many styles of evaluation (qualitative and quantitative):
Use/case studies Usability testing User studies Longitudinal studies Expert evaluation Heuristic evaluation
NERCOMP Workshop, Dec. 2, 2008
Different Rules -> Different Views
Courtesy of Aisee.com
NERCOMP Workshop, Dec. 2, 2008
Categories of Mappings Based on data characteristics
Numbers, text, graphs, software, …. Logical groupings of techniques (Keim)
Standard: bars, lines, pie charts, scatterplots Geometrically transformed: landscapes, parallel
coordinates Icon-based: stick figures, faces, profiles Dense pixels: recursive segments, pixel bar charts Stacked: treemaps, dimensional stacking
Based on dimension management (Ward) Dimension subsetting: scatterplots, pixel-oriented methods Dimension reconfiguring: glyphs, parallel coordinates Dimension reduction: PCA, MDS, Self Organizing Maps Dimension embedding: dimensional stacking, worlds within
worlds
NERCOMP Workshop, Dec. 2, 2008
Scatterplot Matrix Each pair of
dimensions generates a single scatterplot
All combinations arranged in a grid or matrix, each dimension controls a row or column
Look for clusters, outliers, partial correlations, trends
NERCOMP Workshop, Dec. 2, 2008
Parallel Coordinates Each
variable/dimension is a vertical line
Bottom of line is low value, top is high
Each record creates a polyline across all dimensions
Similar records cluster on the screen
Look for clusters, outliers, line angles, crossings
NERCOMP Workshop, Dec. 2, 2008
Star Glyph Glyphs are shapes whose
attributes are controlled by data values
Star glyph is a set of N rays spaced at equal angles
Length of each ray proportional to value for that dimension
Line connects all endpoints of shape
Lay glyphs out in rows and columns
Look for shape similarities and differences, trends
NERCOMP Workshop, Dec. 2, 2008
Other Types of Glyphs
NERCOMP Workshop, Dec. 2, 2008
Dimensional Stacking Break each dimension range into bins Break the screen into a grid using the number of bins for
2 dimensions Repeat the process for 2 more dimensions within the
subimages formed by first grid, recurse through all dimensions
Look for repeated patterns, outliers, trends, gaps
NERCOMP Workshop, Dec. 2, 2008
Pixel-Oriented Techniques Each dimension
creates an image Each value controls
color of a pixel Many organizations of
pixels possible (raster, spiral, circle segment, space-filling curves)
Reordering data can reveal interesting features, relations between dimensions
NERCOMP Workshop, Dec. 2, 2008
Methods to Cope with Scale Many modern datasets contain large
number of records (millions and billions) and/or dimensions (hundreds and thousands)
Several strategies to handle scale problems Sampling Filtering Clustering/aggregation
Techniques can be automated or user-controlled
NERCOMP Workshop, Dec. 2, 2008
Examples of Data Clustering
NERCOMP Workshop, Dec. 2, 2008
Example of Dimension Clustering
NERCOMP Workshop, Dec. 2, 2008
Example of Data Sampling
NERCOMP Workshop, Dec. 2, 2008
The Visual Data Analysis (VDA) Process
Overview Filter/cluster/sample Scan Select “interesting” Details on demand Link between different views
NERCOMP Workshop, Dec. 2, 2008
Demonstration
NERCOMP Workshop, Dec. 2, 2008
Summary
Visualization a powerful component of the data analysis process
Each stage of analysis can be enhanced Visualization can help guide
computational analysis, and vice versa Multiple linked views and a rich
assortment of interactions key to success
NERCOMP Workshop, Dec. 2, 2008
For Further Info on XmdvTool http://davis.wpi.edu/~xmdv Contains source code, windows executable,
data sets, documentation, copies of most Xmdv publications, case studies
We gratefully acknowledge support for the development of XmdvTool from the National Science Foundation (IIS-9732897, IRIS-9729878, IIS-0119276, IIS-0414380, CCF-0811510, and IIS-0812027) and the National Security Agency
NERCOMP Workshop, Dec. 2, 2008
Questions?