+ All Categories
Home > Documents > Daniel A. Keim, Hans-Peter Kriegel Institute for Computer Science, University of Munich 3/23/2011 1...

Daniel A. Keim, Hans-Peter Kriegel Institute for Computer Science, University of Munich 3/23/2011 1...

Date post: 13-Dec-2015
Category:
Upload: katherine-gray
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:

of 43

Click here to load reader

Transcript
  • Slide 1

Daniel A. Keim, Hans-Peter Kriegel Institute for Computer Science, University of Munich 3/23/2011 1 VisDB: Database exploration using Multidimensional Visualization Slide 2 Created By 3/23/2011 2 Rohan Ladkhedkar Ajinkya Raulkar Vrushali Date Anuja Surgude Slide 3 Contents 3/23/2011 3 Introduction to VisDB Basic Idea of VisDB Techniques used Basic Visualization Mapping 2D to Axis Grouping the Dimensions Working Hardware/Software Future Scope Conclusion Slide 4 Introduction to VisDB 3/23/2011 4 Typical difficulties faced with large databases: Finding a specific data No knowledge about database systems, query language and data model Intersection data spots 1 to 1 queries provide multiple data items with no feedback Slide 5 Introduction to VisDB 3/23/2011 5 Sorting the data items according to user query. Visualizing as many data items as possible (Suppose in Ten Million) at the same time to give the user some kind of feedback on his query. Also the resolution of current displays(1 to 3 million pixels) is an important consideration. Interaction of the system with user. Slide 6 Basic Idea of VisDB 3/23/2011 6 Support Query Specification process by visually representing the result. Restricts the visualized dimensions which are of no interest to users. Slide 7 Basic Idea of VisDB 3/23/2011 7 Each pixel of screen is used to visualize the data items resulting from a query. Approximate results are determined using distance functions. These distances are then combined to get relevance factor which is useful for mapping. Slide 8 Distance Function 3/23/2011 8 The distance between attribute and corresponding query value is determined. Distance function used here are data type and application dependent. In some cases, even for a single data type multiple distance functions can be used. Calculating distance functions for 1. Number types(Integer) Numerical difference. 2. Ordinal types(Grades) domain specific distance functions 3. Nominal Types(Professionals) Distance matrix Slide 9 Combining Distances into Relevance Factor 3/23/2011 9 Combine independently calculated distances of the different selection predicates. But it should have a global meaning. User interaction required. Obtain weighting factors (Wj, j 1, , #sp) as per order of importance from users. Normalization of all distances. Linear transformation of the range [ dmax,dmin ] for each predicate e.g. (0,255) Slide 10 Combining Distances into Relevance Factor 3/23/2011 10 For combining the normalized distances we use numerical mean functions such as : 1. Weighted arithmetic mean for AND connected condition part. 2. Weighted geometric mean for OR- connected condition part. Relevance factor is inverse of distance value Slide 11 Formula for calculating combined distance 3/23/2011 11 Slide 12 Reducing the amount of data to be displayed 3/23/2011 12 Adequate heuristics are required to: 1. Reduce amount of data 2. Determine data items whose distances are to be displayed. Hence -quantile is defined as lowest value such that: Slide 13 Techniques Used 3/23/2011 13 3 techniques are used 1. Basic Visualization Technique 2. Mapping two dimensions to the Axes 3. Grouping the dimensions for each data Item Slide 14 1. Basic Visualization Technique 3/23/2011 14 Sorts data according to relevance with respect to query. Then maps the relevance factors to colors. Sorting is needed to avoid sprinkled images (which are not clear to user). Highest Relevance factors centered to middle of window Approximate answers create a rectangular spiral around this region(100% correct answers are yellow in color). Slide 15 1. Basic Visualization Technique 3/23/2011 15 Color ranges from Yellow in middle to green, blue, red and lastly black These ranges denote the distance from correct answers. Slide 16 1. Basic Visualization Technique 3/23/2011 16 Multidimensional Visualization - In this we generate a separate window for each selection predicate of the query. Slide 17 Question 1: 3/23/2011 17 100% correct answers are denoted by which color in Basic Visualization Technique? 1. Red 2. Yellow 3. Green 4. White 5. Blue Slide 18 Answer 1: 3/23/2011 18 Correct answer: 2 Slide 19 2. Mapping Two Dimensions to Axes 3/23/2011 19 Reasons for not pursuing 2D-3D visualizations although they are useful is because of Limited Number of data items. Systems already exist. Improvement Providing feedback on the direction of the distance into visualization. Slide 20 2. Mapping Two Dimensions to Axes 3/23/2011 20 Assign two dimensions to the axes Arrange the relevance factor according to the direction of the distance. For 1 dimension, arrangement is Negative distances to left, Positive distances to right, For other dimension Negative distances to bottom, Positive ones to top Slide 21 2D arrangement of 1dimension 3/23/2011 21 Slide 22 Problems in this method 3/23/2011 22 Corner of window would be completely empty. Worst case- 2 diagonally opposite corners of the window may be completely empty which results in only half data items to be presented Maximizing the number of data item conflict with arrangements that have multiple dimensions assigned to axis. Slide 23 Question 2: 3/23/2011 23 In 1 Dimension Negative distances are arranged 1) at the bottom 2) to the right 3) at the top 4) to the left Slide 24 Answer 2: 3/23/2011 24 Correct answer: 4 Slide 25 3. Grouping the Dimensions for each Data Item 3/23/2011 25 All dimensions for one data item are grouped together in one area. Visualizations generated using this arrangement consists of only one window. We do not focus on shape to distinguish data items, and the criterion and arrangement of the data items is also different. 2x2 pixels per dimension needed as opposed to 1 pixel per dimension in previous 2 methods. Slide 26 Grouping arrangement for 5 Dimensional Data 3/23/2011 26 Slide 27 Contd 3/23/2011 27 Grouping arrangement is only suitable for focused search on smaller data sets because only one-fourth of the data items can be displayed on screen at one point of time. But still provides more visualizations for data sets with larger dimensionality. In other two techniques the pixels for each dimension of the data items are only related by their position. Slide 28 Working 3/23/2011 28 Divided into the Visualization portion on left and Query Modification on right. In Visualization portion the resulting data set including a certain percentage of approximate answers is displayed by using one of the visualization methods. In Query Modification the sliders for modifying the selection predicates and weighting factors as well as some other options are provided. Slide 29 3/23/2011 29 Slide 30 Working contd.. 3/23/2011 30 Different kind of sliders are there. Ex: Sliders for numbers, sliders for discrete types, sliders for non-metric types(ordinal and nominal data types) Other parameters listed are Number of results Query range Weighting factors Data values for selected tuple Data values corresponding to some selected color range Slide 31 Working contd.. 3/23/2011 31 Changing the percentage of data being displayed may completely change the visualization as distance values are normalized according to new range. Normal Mode - System recalculates the visualization after each modification of query. Auto-Recalculate Off mode Queries are only recalculated on demand. Slide 32 Question 3: 3/23/2011 32 In which two sections is VisDB mainly divided?? 1. Visualization Portion 2. Grouping Dimentions 3. Query Modification 4. Coloration of Relevance factors Slide 33 Answer 3: 3/23/2011 33 Correct answer: 1 and 3 Slide 34 Question 4 3/23/2011 34 In which mode does the system recalculates the visualization after each modification of query? 1. Normal Mode 2. Auto Recalculate Mode 3. Visual Mode 4. None of the above. Slide 35 Answer 4: 3/23/2011 35 Correct answer: 1 Slide 36 Example(1000 data Items) 3/23/2011 36 Slide 37 Example(1000 data Items) 3/23/2011 37 Slide 38 Example(7000 data Items) 3/23/2011 38 Slide 39 Example(7000 data Items) 3/23/2011 39 Slide 40 Hardware/Software 3/23/2011 40 Software used C++ MOTIF Hardware used X- Windows on HP 7xx machines(Current version is main memory based and allows interaction data base exploration for database containing 50,000 data items) Slide 41 Future Scope 3/23/2011 41 Automatic generation of queries that correspond to some specific region in one of the visualization windows. Generate time series of visualizations corresponding to queries that are changed incrementally. Applying to many different application domains each having its own parameters, distance functions, query requirements and so on. Slide 42 Conclusion 3/23/2011 42 This VisDB allows visualization of the largest amount of data that can be displayed at one point of time on current display. Provides valuable feedback in querying the database Allows the user to find results which would other wise remain hidden in database. Slide 43 3/23/2011 43 Thank you


Recommended