+ All Categories
Home > Documents > Interactive Visual Exploration of High Dimensional Datasets · Hierarchical Ordering Illustration....

Interactive Visual Exploration of High Dimensional Datasets · Hierarchical Ordering Illustration....

Date post: 28-Jan-2021
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
49
1 1 Interactive Visual Exploration of High Dimensional Datasets Jing Yang Fall 2008 2 Challenges of High Dimensional Datasets High dimensional datasets are common: digital libraries, bioinformatics, simulations, process monitoring, and surveys Example: Ticdata2000 dataset: 86 dimensions OHSUMED dataset: 215 dimensions SkyServer dataset: 361 dimensions Challenges of visualizing high dimensional datasets: Clutter on the screen Difficult user navigation in the data space
Transcript
  • 1

    1

    Interactive Visual Exploration of High Dimensional Datasets

    Jing Yang

    Fall 2008

    2

    Challenges of High Dimensional Datasets

    High dimensional datasets are common: digital libraries, bioinformatics, simulations, process monitoring, and surveys

    Example: Ticdata2000 dataset: 86 dimensionsOHSUMED dataset: 215 dimensions SkyServer dataset: 361 dimensions

    Challenges of visualizing high dimensional datasets:Clutter on the screenDifficult user navigation in the data space

  • 2

    3

    Example

    215*215 = 46,225 plots

    OHSUMED dataset: 215 dimensions

    215 axes

    4

    Visual Hierarchical Dimension Reduction (VHDR)

    J. Yang, M.O. Ward, E.A. Rundensteinerand S. Huang

    VisSym’03

  • 3

    5

    Motivation - Dimension ReductionIdea:

    Project a high-dimensional dataset to a lower-dimensional subspaceVisualize data items in the lower-dimensional subspace

    Existing Approaches:Principal Component Analysis Multidimensional ScalingKohonen’s Self Organizing Map

    Problems: Information lossNo intuitive meaning of generated dimensionsLittle user interaction allowed.

    6

    Inspiration

    Hierarchical parallel coordinate: data item hierarchy

  • 4

    7

    Key Ideas of VHDR

    Use dimension hierarchy to convey dimension relationshipsAllow users to learn the dimension hierarchy Allow users to select dimensions or dimension clusters to form subspaces of interests

    8

    Dimension Hierarchy

    Similar dimensions form cluster, clusters are grouped into larger clusters

    a dimension hierarchy of a 5-d dataset

  • 5

    9

    VHDR Framework

    Step 1: build dimension hierarchy Step 2: navigate and manipulate dimension hierarchyStep 3: interactively select clusters from dimension hierarchy to form lower-dimensional subspaces

    10

    Overview

  • 6

    11

    Build Dimension Hierarchy

    Automatic dimension clusteringCluster dimensions according to dissimilarities*among them*Dissimilarity - measure of how dimensions are dissimilar to each other

    Manual hierarchy modificationDiscussion:

    How to calculate dissimilarity between two dimensions?

    Ref: Ankerst, M., Berchtold, S., and Keim, D. A. Similarity clustering of dimensions for an enhanced visualization of multidimensional data. InfoVis’98

    12

    Navigate and Manipulate Dimension Hierarchy

    InterRing - Radial space filling hierarchy navigation tool [yang:2002]ModificationSelectionRadius distortionCircular distortionRolling up/Drilling downRotationZooming/Panning

  • 7

    13

    Construct Lower-Dimensional Subspaces

    Strategy 1: construct a subspace with closely related dimensions

    14

    Construct Lower-Dimensional Subspaces

    Strategy 2: construct a subspace that covers major variance of the dataset

  • 8

    15

    Dimension Cluster Representation

    Representative Dimension - a dimension that represents a cluster of dimensions

    Approaches to assigning or generating a representative dimension:

    1. Select a dimension from the cluster 2. Average all dimensions in the cluster3. Use principal component analysis to generate

    weighted sum of dimensions within a cluster

    16

    Examples

    Approach - averageApproach - select

    representative dimensionrepresentative dimension

  • 9

    17

    Dissimilarity Representation

    Approaches:Axis WidthThree AxesDiagonal PlotsOuter and Inner SticksMean-Band

    Goal: visualize dissimilarity of dimensions in a dimension cluster.

    Example: select 3 dimension clusters (dimensions) in Census-Income dataset

    18

    Dissimilarity Representation : Axis Width Method

  • 10

    19

    Dissimilarity Representation : Three Axis Method

    20

    Dissimilarity Representation : Three Axis Method

  • 11

    21

    Dissimilarity Representation : Diagonal Plots Method

    No dissimilarities representation

    Dissimilarities represented in diagonal plots

    22

    GeneralityVHDR is a general framework that can be

    extended to multiple display techniques

    We have applied VHDR to:

    Parallel CoordinatesStar GlyphsScatterplot MatricesDimensional Stacking

    Hierarchical Parallel CoordinatesHierarchical Star GlyphsHierarchical Scatterplot

    MatricesHierarchical Dimensional

    Stacking

  • 12

    23

    Other Clustering Approach

    Visualization of Large-Scale Customer Satisfaction Surveys Using a Parallel Coordinate Tree, D. Brodbeck et. al. Infovis2003

    24

  • 13

    25

    Interactive Hierarchical Dimension Ordering, Spacing and Filtering For Exploration of High Dimensional Datasets

    Jing Yang, Wei Peng, Matthew O. Wardand Elke A. Rundensteiner

    InfoVis’03

    26

    Motivation

    Large number of dimensions need to be managed

    Ordering, spacing, filtering etc.

  • 14

    27

    Overview

    General: includes dimension ordering, dimension spacing and dimension filteringInteractive: allows user interactions throughout the whole processHierarchical: groups dimensions into a hierarchy and builds most algorithms and user interactions upon this hierarchy to increase scalability

    28

    Dimension Ordering (1)

    Random order

  • 15

    29

    Dimension Ordering (2)

    Ordered by similarity

    30

    Dimension Ordering (3)Order dimensions according to different

    purposes:Similarity-oriented ordering: put similar dimensions close to each otherImportance-oriented ordering: map more important dimensions to more significant positions or attributes. The order of importance can be decided by Principal Component Analysis (PCA)

  • 16

    31

    Dimension Ordering (4)Challenges for ordering high dimensional datasets:

    Similarity-oriented ordering is NP-CompleteIt is hard to decide the order of the importance of a large number of dimensions using PCA

    Our solution: reduce the complexity of the ordering problem using the dimension hierarchyOrder each dimension clusterthe order of the dimensions is decided in a depth-first traversal of the dimension hierarchy

    32

    Hierarchical OrderingIllustration

  • 17

    33

    Dimension Ordering (6)

    Random order Similarity-oriented order

    34

    Dimension Spacing (1)

    Idea of dimension spacing:Convey dimension relationship information by varying the spacing between adjacent axes

  • 18

    35

    Dimension Spacing (2)

    Dimensions spaced according to similarity: similar dimensions are close to each other

    36

    p g(1)

  • 19

    37

    Dimension Spacing Distortion (2)

    Before After

    38

    Dimension Filtering (1)

    Idea of dimension filtering:Similar dimensions can be omitted;Unimportant dimensions can be omitted.

  • 20

    39

    Dimension Filtering (2)

    Unfiltered Filtered

    40

    Conclusion

    The proposed approachImproves the manageability of dimensions in high dimensional data sets and reduces the complexity of the ordering, spacing and filtering tasks;Allows flexible user interactions for dimension ordering, spacing and filtering with dimension hierarchies.

  • 21

    41

    Value and Relation (VaR) Display

    Jing Yang, Anilkumar Patro, Shiping Huang, NishantMehta, Matthew O. Ward and Elke A. Rundensteiner

    InfoVis’04

    42

    Motivation

    Challenges:Can high dimensional datasets be visualized without dimension reduction to avoid information loss ?Can dimension relationships be visualized in the same display as data values?

  • 22

    43

    Challenge - Visualization without Dimension Reduction

    Visualize SkyServer dataset (361 dimensions) using existing techniques: Parallel Coordinates: 361 axesScatterplot Matrix: 130,321 scatterplotsPixel-Oriented techniques without overlaps: 50,000 data items: 18,050,000 pixels (23 times of number of pixels in a 1024*768 screen)

    Hint:Use Pixel-Oriented techniques and allow overlaps

    44

    Challenge - Dimension Relationship Visualization

    Sorting dimensions in a 1D or 2D grid [Ankerst 98]

    Not effective beyond hundreds of dimensions

    Spacing between dimensions [Yang 2003]

    Only relationships of adjacent dimensions are revealed Pixel-Oriented: Sort 50 dimensions

    in a 2D grid [Ankerst 98]

  • 23

    45

    Challenge - Dimension Relationship Visualization (con.)

    SPIRE Galaxies: Map data items to a 2D display using MDS [Wise: 95]

    Recall data item relationship visualization:MDS: SPIRE Galaxies [Wise:95]

    Hint: Using MDS to layout dimensions

    46

    Our Proposal: Value and Relation (VaR) Display

    d1 d2 d3 d4

    d1d2d3

    00.70.60.7

    0.700.30.2

    0.60.300.1

    0.70.20.10

    d4

    d4d1d2

    d3

    Multi-Dimensional Scaling

    Pixel-Oriented glyph

  • 24

    47

    SkyServer dataset: 361 dimensions, 50,000 data items

    Value and Relation Display

    Features: Explicitly conveys data values without dimension reductionExplicitly conveys dimension relationshipsProvides a rich set of interaction tools

    48

    Overlap Detection and Reduction

    Extent ScalingDynamic MaskingZooming and PanningShowing NamesLayer ReorderingManual Relocation

    Automatic Shifting

    SkyServer Dataset

  • 25

    49

    Distortion

    Goal: Focus-within-context

    Features: Enlarges clicked glyphs Keeps size of other glyphs

    SkyServer Dataset

    50

    Data Item Reordering

    Pixel-oriented techniques:Data item ordering is critical

    VaR display: Initial displayManual reordering

    Census-Income-Part Dataset: 42 dimensions, 20,000 data items

  • 26

    51

    Comparing

    Goal: Compare base dimension with all others

    Feature: Coloring by value difference of dimensions being compared

    AAUP Dataset: 14 dimensions, 1,131 data items

    52

    Selection

    Goal:Select dimensions for further interaction or visualization

    Selection tools in VaR display:Manual selection - flexibilityAutomatic selection - efficiency

    Select related dimensionsSelect unrelated dimensions

  • 27

    53

    Automatic Selection for Unrelated Dimensions

    Input: A base dimension“Related” threshold

    Output: Dimensions covering major data variance

    Algorithm: Iteratively select unrelated dimensions and filter related dimensions

    Related work:Maximum subspace [MacEachren:03] SkyServer Dataset

    54

    Scale to Large Datasets

    Store glyphs as texture objectsExtent scaling and relocating: resize, relocate texture objects ☺Reordering and recoloring: regenerate texture objects

    Use random sampling Users interactively set thresholdRandom sampling is triggered automatically

    Without sampling (16K data items)

    With sampling (5K data items)

    Out5D Dataset

  • 28

    55

    Discussion

    Is pixel-oriented technique the only choice for generating dimension glyphs?Histogram, Scatterplot, …

    Is 2D MDS the only approach to layout dimensions? 3D MDS, SOM, Treemap, Animation…

    Is correlation the most informative relationshipamong dimensions? Multivariate relationships

    56

    Value and Relation Display: Interactive Visual Exploration of Large Datasets with Hundreds of Dimensions.

    J. Yang, D. Hubball, M. Ward, E. Rundensteiner and W. Ribarsky

    IEEE Transactions on Visualization and Computer Graphics 13(3)

  • 29

    57

    XRay Dimension Glyphs

    Each glyph: a scatterplotmatrix

    X: a base dimension that is the same for all glyphsY: the dimension it represents

    Density based displayBright: sparseDark: dense

    Unoccupied area: semi-transparent

    58

    A real dataset with 89 dimensions and 10,417 data items in Pixel and XRay Vars.

    XRay Dimension Glyphs

  • 30

    59

    Jigsaw Map Dimension LayoutDimension hierarchyUsing H-Curve to create a Jigsaw Map

    M. Wattenberg. A note on space-filling visualizations and space-filling curves. InfoVis 2005, pages 181–186

    60

    A real dataset with 838 dimensions and 11,413 data items in Pixel-Jigsaw VaR and XRay-Jigsaw VaR

    Jigsaw Map Dimension Layout

  • 31

    61

    Rainfall Dimension LayoutMetaphor: RainCenter Bottom: focus dimension DSpeed of a dimension: related to its correlation to DTime: user controllable

    62

    Rainfall Dimension Layout

  • 32

    63

    Data Item Selection and Masking

    Visual query style data item selectionData item based masking

    (a) No mask (b) Opaque mask (c) Semi-transparent mask

    64

    Labeling

    (a) All labels are shown

    (b) Labels of selected dimensions are shown

    (c) Angled labels in Jigsaw map layout

  • 33

    65

    Possible Applications of VaR Display

    Interactively exploring high dimensional dataRevealing data item relationshipsRevealing dimension relationships

    Guiding automatic data analysisAssessing resultsManually tuning parameters

    Human-driven dimension reductionConstructing subspaces using selection toolsVisualizing subspaces in VaR or other displays

    66

    Semantic Image Browser: Bridging Information Visualization with Automated Intelligent Image Analysis

    Jing Yang1, Jianping Fan1, Daniel Hubball1, Yuli Gao1, Hangzai Luo1, William Ribarsky1, and Matthew Ward2

    1 University of North Carolina at Charlotte2Worcester Polytechnic Institute

    Acknowledgements: This work is supported by NVAC

  • 34

    67

    Motivation

    Interactive image exploration:Applications: personal image management, satellite image analysis, ...

    Background: Automated semantic image analysisGap between semantic image analysis and image exploration

    Goals:Facilitate image exploration using analysis resultsEvaluate, monitor and improve analysis processes

    68

    Semantic Image Browser Overview

    Annotation engineAutomated semantic image classification process

    Multiple coordinated viewsImage view – MDS, Rainfall, SequentialContent view – VaR

    Interactions Search by sample imageSearch by semantic contentInteractive annotation examination and modificationZooming, panning, distortion

  • 35

    69

    Annotation Engine

    Content-Based Image Annotation [fan:2004]

    Low level visual featuresSemantic contentsSemantic concepts

    Semantic contents: high dimensional datasetdata items: imagesdimensions: contentsvalues: 1 (image contains the content) or 0 (otherwise)

    70

    Image View – MDS layout

    Corel collection (1100 images, 20 contents)

  • 36

    71

    Navigation Tools

    72

    Image View – Rainfall Layout

  • 37

    73

    Content View

    VaR display [yang:2004]Content blocks

    Pixel-oriented techniques [Keim 94]Color assignment

    Unselected images:Red - 1 Gray – 0

    Selected images:Blue – 1Light gray - 0

    MDS layout of content blocksInteractions

    Corel image collection (1100 images, 20 contents)

    74

    Search by Sample Image

  • 38

    75

    Search by Semantic Content

    76

    Annotation Evaluation and Modification

    Case 1: RedflowerCase 2: Sailcloth

  • 39

    77

    User StudySubjects

    10 UNCC students Dataset: Corel dataset (20 contents, 1100 images)Systems compared

    No annotation: Unsorted Thumbnails in ExplorerSemantic contents: Semantic image browserSemantic concepts: Thumbnails sorted by concepts in Explorer

    TasksTask1: Find three given imagesTask2: Find images with certain contentsTask3: Estimate percentage of images containing certain contents

    78

    User Study ResultsTask1: Find three given images

    Result: Sorted Explorer was better than Semantic browserSemantic browser was better than Unsorted Explorer

    Major reason: Annotations in the semantic concept level were more “error tolerant”

    Task2: Find images with certain contentsResult was similar to task1

    Task3: Estimate percentage of images containing certain contents

    Result: Semantic browser was faster and more accurate than sorted Explorer and unsorted Explorer

    Post experiment questionnaire (1 to 10 scale)Semantic browser was preferred Semantic browser was useful

  • 40

    79

    Multivariate Visual Explanation for High Dimensional Datasets

    S. Barlowe, T. Zhang, Y. Liu, J. Yang and D. Jacobs

    VAST 2008

    80

    Worldview GapWorldview Gap - gap between what is being shown and what actually needs to be shown to draw a straightforward representational conclusion for decision making

    - Amar and stasko, InfoVis 2004 best paper

    Filling Worldview Gap:Our approach - Embedding automatic analysis

    into information visualization

  • 41

    81

    Multivariate Visual Explanation

    With Scatt Barlowe, Tianyi Zhang, Yujie Liu, and Donald Jacobs

    82

    Motivation

    Understanding multivariate relationships is critical in a vast number of applicationsExample:

    Economic forecasting

  • 42

    83

    What is the relationship?

    Scatterplot MatrixParallel Coordinates

    y0 = x0x1 + x2

    84

    Worldview Gap

    Worldview Gap - gap between what is being shown and what actually needs to be shown to draw a straightforward representational conclusion for decision making

    - Amar and stasko, InfoVis 2004 best paper

  • 43

    85

    Multivariate Visual ExplanationGoals:

    Multivariate relationship understandingDimension Reduction Model Construction

    Approach: Integrate partial derivative calculation into multivariate visualization

    Partial derivative calculation and inspectionStep by step visual exploration with interactive model construction and dimension reduction

    86

    Partial Derivative

    Derivative: measurement of how a functionchanges when values of its inputs change

    Example: derivative at a point in time of the position of a car: instantaneous speed

    Partial derivative of a function of several variables: derivative with respect to one of those variables with the others held constant

  • 44

    87

    Partial Derivative Inspection

    Partial derivative calculation introduces errors

    88

    Partial Derivative InspectionVisually present errors to users

    Error inspection of a segmented dataset:y = 8x0 +x1 if x0 ≥ 0.6 and x1 ≤ 0.3 y = x0−7x1 otherwise

  • 45

    89

    Visual Exploration of Partial Derivatives

    Show all partial derivatives together with the original dimensions? Scalability Challenge: 4-d dataset with dependent variable y and independent variables x0, x1, and x2

    1st order derivatives: ∂y/∂x0, ∂y/∂x1, ∂y/∂x22nd order derivatives: ∂y∂y/∂x0∂x0, ∂y∂y/∂x0∂x1, ∂y∂y/∂x0∂x2, ∂y∂y/∂x1∂x1, ∂y∂y/∂x1∂x2, ∂y∂y/∂x2∂x2

    Screen will be cluttered!

    90

    Visual Exploration of Partial Derivatives

    Examine all types of relationships from one display? Users would be overwhelmed!

  • 46

    91

    Step By Step Visual Exploration

    Different types of correlations are examined in different stepsCorrelations easier to be detected will be examined firstVariable with detected relationships will be excluded from further analysis

    92

    Step1: 1st Order Partial Derivative Histograms

    Display: the histograms of 1st order partial derivativesInformation to be detected:

    Significant independent variablesIgnorable independent variablesIndependent variables linearly impact dependent variable Independent variable Positively or negatively impact?

  • 47

    93

    94

  • 48

    95

    96

    Step2: 1st Order Partial Derivativesvs. Original Dimensions Scatterplots

    Information:Entangled?

    Dataset: y0 = x0x1 + x2, 1000 data items

  • 49

    97

    Coordinated Visual Exploration

    98

    Model Construction


Recommended