+ All Categories
Home > Documents > VisDB : Database exploration using Multidimensional Visualization

VisDB : Database exploration using Multidimensional Visualization

Date post: 24-Feb-2016
Category:
Upload: edolie
View: 72 times
Download: 0 times
Share this document with a friend
Description:
VisDB : Database exploration using Multidimensional Visualization. Daniel A. Keim, Hans-Peter Kriegel Institute for Computer Science, University of Munich. Created By. Rohan Ladkhedkar Ajinkya Raulkar Vrushali Date Anuja Surgude. Contents. Introduction to VisDB Basic Idea of VisDB - PowerPoint PPT Presentation
Popular Tags:

of 43

Click here to load reader

Transcript

Multi

Daniel A. Keim, Hans-Peter KriegelInstitute for Computer Science, University of Munich3/23/20111VisDB: Database exploration using Multidimensional VisualizationCreated By3/23/20112Rohan Ladkhedkar Ajinkya RaulkarVrushali DateAnuja SurgudeContents3/23/20113Introduction to VisDBBasic Idea of VisDBTechniques usedBasic VisualizationMapping 2D to AxisGrouping the DimensionsWorking Hardware/SoftwareFuture ScopeConclusionIntroduction to VisDB3/23/20114Typical difficulties faced with large databases:

Finding a specific dataNo knowledge about database systems, query language and data modelIntersection data spots1 to 1 queries provide multiple data items with no feedbackIntroduction to VisDB3/23/20115Sorting the data items according to user query.Visualizing as many data items as possible (Suppose in Ten Million) at the same time to give the user some kind of feedback on his query. Also the resolution of current displays(1 to 3 million pixels) is an important consideration.Interaction of the system with user.

Basic Idea of VisDB

3/23/20116Support Query Specification process by visually representing the result.

Restricts the visualized dimensions which are of no interest to users.

Basic Idea of VisDB

3/23/20117Each pixel of screen is used to visualize the data items resulting from a query. Approximate results are determined using distance functions.These distances are then combined to get relevance factor which is useful for mapping.Distance Function3/23/20118The distance between attribute and corresponding query value is determined.Distance function used here are data type and application dependent.In some cases, even for a single data type multiple distance functions can be used.Calculating distance functions forNumber types(Integer) Numerical difference. Ordinal types(Grades) domain specific distance functionsNominal Types(Professionals) Distance matrix

Combining Distances into Relevance Factor3/23/20119Combine independently calculated distances of the different selection predicates.But it should have a global meaning.User interaction required. Obtain weighting factors (Wj, j 1, , #sp) as per order of importance from users.Normalization of all distances. Linear transformation of the range [dmax,dmin] for each predicate e.g. (0,255)Combining Distances into Relevance Factor3/23/201110For combining the normalized distances we use numerical mean functions such as :1. Weighted arithmetic mean for AND connected condition part. Weighted geometric mean for OR- connected condition part.

Relevance factor is inverse of distance value

Formula for calculating combined distance3/23/201111

Reducing the amount of data to be displayed3/23/201112Adequate heuristics are required to:Reduce amount of dataDetermine data items whose distances are to be displayed.Hence -quantile is defined as lowest value such that:

Techniques Used3/23/2011133 techniques are usedBasic Visualization TechniqueMapping two dimensions to the AxesGrouping the dimensions for each data Item1. Basic Visualization Technique3/23/201114Sorts data according to relevance with respect to query.Then maps the relevance factors to colors.Sorting is needed to avoid sprinkled images (which are not clear to user).Highest Relevance factors centered to middle of windowApproximate answers create a rectangular spiral around this region(100% correct answers are yellow in color).1. Basic Visualization Technique3/23/201115Color ranges from Yellow in middle to green, blue, red and lastly black These ranges denote the distance from correct answers.

1. Basic Visualization Technique3/23/201116Multidimensional Visualization - In this we generate a separate window for each selection predicate of the query.

Question 1:3/23/201117100% correct answers are denoted by which color in Basic Visualization Technique?

RedYellowGreenWhiteBlueAnswer 1:3/23/201118Correct answer: 2 2. Mapping Two Dimensions to Axes3/23/201119Reasons for not pursuing 2D-3D visualizations although they are useful is because ofLimited Number of data items.Systems already exist.Improvement Providing feedback on the direction of the distance into visualization.2. Mapping Two Dimensions to Axes3/23/201120Assign two dimensions to the axesArrange the relevance factor according to the direction of the distance.For 1 dimension, arrangement is Negative distances to left, Positive distances to right, For other dimension Negative distances to bottom, Positive ones to top2D arrangement of 1dimension3/23/201121

Problems in this method3/23/201122Corner of window would be completely empty.Worst case- 2 diagonally opposite corners of the window may be completely empty which results in only half data items to be presentedMaximizing the number of data item conflict with arrangements that have multiple dimensions assigned to axis.Question 2:3/23/201123In 1 Dimension Negative distances are arranged1) at the bottom2) to the right3) at the top4) to the left

Answer 2:3/23/201124Correct answer: 4

3. Grouping the Dimensions for each Data Item3/23/201125All dimensions for one data item are grouped together in one area.Visualizations generated using this arrangement consists of only one window.We do not focus on shape to distinguish data items, and the criterion and arrangement of the data items is also different.2x2 pixels per dimension needed as opposed to 1 pixel per dimension in previous 2 methods.

Grouping arrangement for 5 Dimensional Data3/23/201126

Contd3/23/201127Grouping arrangement is only suitable for focused search on smaller data sets because only one-fourth of the data items can be displayed on screen at one point of time.But still provides more visualizations for data sets with larger dimensionality.In other two techniques the pixels for each dimension of the data items are only related by their position.Working3/23/201128Divided into the Visualization portion on left and Query Modification on right.In Visualization portion the resulting data set including a certain percentage of approximate answers is displayed by using one of the visualization methods.In Query Modification the sliders for modifying the selection predicates and weighting factors as well as some other options are provided.3/23/201129

Working contd..3/23/201130Different kind of sliders are there.Ex: Sliders for numbers, sliders for discrete types, sliders for non-metric types(ordinal and nominal data types)Other parameters listed areNumber of resultsQuery rangeWeighting factorsData values for selected tupleData values corresponding to some selected color rangeWorking contd..3/23/201131Changing the percentage of data being displayed may completely change the visualization as distance values are normalized according to new range.Normal Mode - System recalculates the visualization after each modification of query.Auto-Recalculate Off mode Queries are only recalculated on demand. Question 3:3/23/201132In which two sections is VisDB mainly divided??

Visualization PortionGrouping DimentionsQuery ModificationColoration of Relevance factors

Answer 3:3/23/201133Correct answer: 1 and 3Question 43/23/201134In which mode does the system recalculates the visualization after each modification of query?

Normal ModeAuto Recalculate ModeVisual ModeNone of the above.

Answer 4:3/23/201135Correct answer: 1Example(1000 data Items)3/23/201136

Example(1000 data Items)3/23/201137

Example(7000 data Items)3/23/201138

Example(7000 data Items)3/23/201139

Hardware/Software3/23/201140Software usedC++MOTIFHardware used X- Windows on HP 7xx machines(Current version is main memory based and allows interaction data base exploration for database containing 50,000 data items)

Future Scope3/23/201141Automatic generation of queries that correspond to some specific region in one of the visualization windows.Generate time series of visualizations corresponding to queries that are changed incrementally.Applying to many different application domains each having its own parameters, distance functions, query requirements and so on.Conclusion3/23/201142This VisDB allows visualization of the largest amount of data that can be displayed at one point of time on current display.Provides valuable feedback in querying the database Allows the user to find results which would other wise remain hidden in database.3/23/201143Thank you


Recommended