+ All Categories

Polaris

Date post: 22-Feb-2016
Category:
Upload: hila
View: 27 times
Download: 0 times
Share this document with a friend
Description:
A System for Query, Analysis and Visualization of Multi-dimensional Relational Databases. By Chris Stolte and Pat Hanrahan , Stanford University. Polaris. Presenter: Ganesh Viswanathan CIS6930: Data Science: Large-scale Advanced Data Analysis University of Florida September 15, 2011. - PowerPoint PPT Presentation
Popular Tags:
62
POLARIS Area: Data visualization & Interface design A System for Query, Analysis and Visualization of Multi-dimensional Relational Databases By Chris Stolte and Pat Hanrahan, Stanford University Presenter: Ganesh Viswanathan CIS6930: Data Science: Large-scale Advanced Data Analysis University of Florida September 15, 2011
Transcript

Data Science class presentation: POLARIS

PolarisArea: Data visualization & Interface designA System for Query, Analysis and Visualization of Multi-dimensional Relational DatabasesBy Chris Stolte and Pat Hanrahan, Stanford University

Presenter: Ganesh ViswanathanCIS6930: Data Science: Large-scale Advanced Data AnalysisUniversity of FloridaSeptember 15, 2011

MotivationLarge multi-dimensional databases have become very commoncorporate data warehousesAmazon, Walmart, scientific projects: Human Genome ProjectSloan Digital Sky Survey

A major challenge for these huge databases is to extract meaning from the data they contain such as:to discover structure,to find patterns, andto derive causal relationship.

Need tools for exploration and analysis of these databases2The exploratory analysis process is one of hypothesis, experiment, and discovery.The path of exploration is unpredictable and the analysts need to be able to rapidly change both what data they are viewing and how they are viewing that data.Existing Tools: Charts

typically provide a gallery of chartshard to iteratively exploresimple charts can display few dimensions

Existing tools: Pivot Tables

common interface to data warehousessimple interface based on drag-and-dropgenerate text tables from databases:

-- The most popular interface to multi-dimensional databases.-- Allow the data cube to be rotated so that different dimensions of the dataset may be encoded as rows or columns of the table.-- The remaining dimensions are aggregated & displayed as numbers in the cells of the table.-- Cross-tabulations and summaries are then added to the resulting table of numbers.-- Finally, graphs may be generated from the resulting tables.

4Pivot TablesMulti-dimensional databases are often treated as n-dimensional data cubes.

Pivot Tables allow rotation of multi-dimensional datasets, allowing different dimensions to assume the rows and columns of the table, with the remaining dimensions being aggregated within the table.TIME1 23 4 76 5 PRODUCTToothpaste JuiceColaMilk CreamSoap REGIONWS N Dimensions

Pivot Table Example: Baseball dataBy dragging and dropping the dimensions to and from the left-hand column, top row, upper-left corner, and central data area (where the remaining dimensions are aggregated), one can change the Pivot Table view. Any of these views can be subsequently graphed.

6

Pivot Table Example: Baseball data

Pivot Table Example: Baseball data

Pivot Table Example: Baseball dataRelational databasesEach row in table = basic entity (tuple)Each column represents a fieldFields can be ordinal, or quantitative

Relational Data Schema

Structural description of data setsPrimitives: attributes, tuples and relations

MotivationRelational data schema enables flexible database design

No corresponding flexible ways to construct effective UI and visualizationunique data schema unique visualization/coordinationdatabase keeps changingdifferent views for same dataMismatch in design capabilitiesRelational DatabasesTraditional VisualizationDesign GoalData designVisualization designDesign MethodData schemaProgram codeDesignerData ownerProgrammer onlyDesign ChangeRapid, dynamicSlow, staticAdaptabilityFlexibleBrittleRequirements on UI for Analysis and ExplorationData dense displays: display both many tuples & many dimensions

Multiple display types: different displays suited to different tasks

Exploratory interfaces: rapidly change data transformations and views14Data dense: Analysts need to be able to create visualizations that will simultaneously display many dimensions of large subsets of the data.Multiple display types: Analysis consists of many different task such as discovering correlation between variables, finding patterns in the data, locating outliers and uncovering structure.An analysis tool must be able to generate displays suited to each of these tasks.Exploratory interface: The analysis process is often an unpredictable exploration of the data. Analysts must be able to rapidly change what data they are viewing and how they are viewing that dataPolarisPolaris is an interface for the exploration of multi-dimensional databases that extends the pivot table interface to directly generate a rich, expressive set of graphical displays.Polaris Design GoalsGenerate rich table-based graphical displays rather than tables of textSingle conceptual model for both graphs and tablesPreserve ability to rapidly construct displaysInteractive analysis and exploration versus static visualizationSimple, consistent interface Ease analysis and exploration:Want to extract meaning from dataProcess of hypothesis, experiment, and discoveryPath of exploration is unpredictable

16Mention 2 points then go on. Excel Pivot tables provide a simple interface for building text-based tables

Graphs require multiple steps: different interfaces and conceptual models

Want to unify tables, graphs, and database queries in one interface

Features of PolarisBuilds tables using an algebraic formalism involving the fields of the database.

Each table consists of layers and panes, and each pane may be a different graphic.

An interface for constructing visual specifications of table-based graphical displays.

The state of the interface can be interpreted as a visual specification of the analysis task and automatically compiled into data and graphical transformations. Features of PolarisThe visual specifications can be rapidly & incrementally developed, giving the users visual feedback as they construct complex queries & visualization.

Ability to generate a precise set of relational queries from the visual specifications.

Users can incrementally construct complex queries, receiving visual feedback as they assemble and alter the specifications.

Features of PolarisAddresses these demands by providing an interface for rapidly and incrementally generating table-based displays.A table consists of a number of rows, columns, and layers.Each table axis may contain multiple nested dimensions.Each table entry, or pane, contains a set of records that are visually encoded as a set of marks to create a graphic.Visualizing Multidimensional DataSeveral characteristics to tables make them particularly effective for displaying multi-dimensional data:

Multivariate - multiple dimensions of the data can be explicitly encoded in the structure of the table, enabling the display of high-dimensional data.

Comparative - tables generate small multiple displays of information, which are easily compared, exposing patterns and trends across dimensions of the data.

Familiar - Statisticians are accustomed to using tabular displays of graphs, such as scatterplot matrices and Trellis displays, for analysis. Pivot Tables are a common interface to large data warehouses.

multivariate: multiple dimensions can be encoded in the structure of the tablecomparative: tables generate small-multiple displays of informationfamiliar: users are accustomed to tabular displays

20

Polaris Display: UIDesign Decision: Use a FormalismWhy a formalism?unification: unify tables and graphsexpressiveness: build visualizations designers did not think ofinterface simplicity: clearly defined semantics and operationscode simplicity: composable language versus monolithic objectsdeclarative: can state what, not how - allows for optimization, etc.

22Polaris FormalismInterface interpreted as visual specification in formal language that defines:table configurationtype of graphic in each paneencoding of data as visual properties of marks

Specification compiled into data & graphical transformations to generate displayExample specification

table configuration}Formalism Example: Specifying Table ConfigurationsInterface: define table configuration by dropping fields on shelves

Formalism: shelf content interpreted as expressions in table algebra

Can express extremely wide range of table configurationsFormalism Example: Specifying Table ConfigurationsOperands are the database fieldseach operand interpreted as a set {}quantitative and ordinal fields interpreted differently

Three main operators:concatenation (+), cross (X), nest (/)Additionally: dot (.) operator26Table Algebra: OperandsOrdinal fields - interpret domain as a set that partitions table into rows and columns: QUARTER = {Quarter1,Quarter2,Quarter3,Quarter4} Quarter 1Quarter 2Quarter 3Quarter 431,40037,12035,60030,900Profit (in thousands)102030405060 Quantitative fields treat domain as single element set and encode spatially as axes: PROFIT = {P[0 - 65,000]}

27Table Algebra: Concatenation (+) OperatorOrdered union of set interpretations:QUARTER + PRODUCT_TYPE = {QTR1,QTR2,QTR3,QTR4} + {Coffee, Tea} = {QTR1,QTR2,QTR3,QTR4, Coffee, Tea} Quarter 1Quarter 2Quarter 3Quarter 431,40037,12035,60030,900CoffeeTea37,12030,900PROFIT + SALES = {P[0-65,000], S[0-125,000]}Profit (in thousands)102030405060Sales (in thousands)20406080100120Table Algebra: Cross (X) OperatorCross-product of set interpretations:QUARTER X PRODUCT_TYPE = PRODUCT_TYPE X PROFIT = Profit (in thousands)102030405060Profit (in thousands)102030405060CoffeeTeaQuarter 1Quarter 2Quarter 3Quarter 4CoffeeTeaCoffeeTeaCoffeeTeaCoffeeTea{(Qtr1,Coffee), (Qtr1, Tea), (Qtr2, Coffee), (Qtr2, Tea), (Qtr3, Coffee), (Qtr3, Tea), (Qtr4, Coffee), (Qtr4,Tea)} Table Algebra: Nest (/) OperatorQUARTER X MONTHwould create entry twelve entries for each quarter i.e. (Qtr1, December)

QUARTER / MONTHwould only create three entries per quarterbased on tuples in database not semanticscan be expensive to compute

Polaris Display: UIPolaris Display Drag and drop fields from database scheme onto shelves

May combine multiple data sources, each data source mapping to a separate layer

Multiple fields may be dragged onto each shelf

Data may be grouped or sorted, and aggregations may be computedPolaris DisplaySelecting a single mark in a graphic displays the values for the mark

Can lasso a set of marks to brush records

Marks in the graphics use retinal propertiesRetinal PropertiesOrdinal/nominal mapping vs. quantitative mappingProperties: Shape, size, orientation, and color.When encoding a quantitative variables, should only vary one aspect at a time

Visual SpecificationIs the configuration of the fields of the tables on shelvesUser does this by dragging and dropping fields onto shelvesControls:Mapping of data sources to layers# of rows, columns, and layers, and relative orderSelection of tuples from the databaseGrouping of data within a paneType of graphic displayed in each paneMapping of data fields with retinal propertiesGraphicsOrdinal-Ordinal: e.g. the tablethe axis variables are typically independent of each otherOrdinal-Quantitative: e.g. bar chartthe quantitative variable is often dependent on the ordinal variableQuantitative-Quantitative: e.g. mapsview distribution of data as a function of one or both variables; discover causal relationshipsTypes of Graphics (Ordinal- Ordinal)Axis variables are independent of each other

R represents the fields encoded in the retinal properties of the marksFollowing slide shows sales and margin as a function of product type, month and state for items sold by coffee chain

Ordinal-Quantitative GraphicsBar charts, dot plots, Gantt chartQuantitative variable is dependent of ordinal variable

Following slide shows a case where a matrix of bar charts is used to study several functions of the independent variables product and month

Quantitative-Quantitative GraphicsDiscover causal relationships between the two quantitative variables.Following slide shows how flight scheduling varies with the region of the country the flight originated

Visual mappingsEncoding different fields of the data to retinal propertiesShape, Size, Orientation, ColorUsed in the ordinal to ordinal example

Display Types

Gantt charts of events for a parallel graphics application on a 32-processor SGI machine.Flights between major airports in the USASource code colored by cache misses for a parallel graphics application.Major wars and the births of well known scientists as a timeline.QueryingThree steps:Select the recordsPartition the records into panesTransform the records within the panes

To create database queries, it is necessary to generate an SQL query per table pane (i.e. must iterate over entire table, executing SQL for each pane).

Transformations and Data FlowGenerating Database Queries1. Selecting the Records

Generating Database Queries2. Partitioning the records into painsPutting retrieved records in their corresponding pane

Generating Database Queries3. Transforming Records within the PanesIf aggregation, it is done here

Example applicationCut expenses for a national coffee storeCreate table of scatterplots showing relationship between marketing costs and profit (Figure 6a)Notice trend; certain products have high marketing costs with no or little profit

PolarisDemo: Tableau Software

Related WorkSingle relation visualizationAPTSage/SageBrushDEVise

Multiple relation visualization VisageDataSplash/Tioga-2Rivet/PolarisSieveRelated WorkFormalisms for GraphicsWilkinsons Grammar of Graphics Bertins Semiology of GraphicsMackinlays APT

Visual QueriesTrellis display, DeVise, Visage

Table-based VisualizationsTable lens, Spreadsheet for VisualizationInteresting, upcoming projects:IBM Many Eyes: Site allows users to upload data and then produce graphic representations for others to view and comment upon for free!Processing: Open source programming language and environment for people who want to create images, animations, and interactions.Prefuse: Interactive information visualization toolkit

Commercial visualization software: Tableau, Qlikview, Tibco Spotfire, Microsoft BI platform (PowerPivot, Excel 2010, SQL Server with VertiPaq, SSAS, SSRS and SSIS)To "democratize" visualization, and experiment with new collaborative techniques, we builtMany Eyes, a web site where people may upload their own data, create interactive visualizations, and carry on conversations. The goal is to foster a social style of data analysis in which visualizations serve not only as a discovery tool for individuals but also as a means to spur discussion and collaboration.

Processing: from MIT Media Lab: Processingis anopen sourceprogramming languageandintegrated development environment(IDE) built for the electronic arts and visual design communities with the purpose of teaching the basics ofcomputer programmingin a visual context, and to serve as the foundation for electronic sketchbooks.

One of the stated aims of Processing is to act as a tool to get non-programmers started with programming, through the instant gratification of visual feedback. The language builds on theJava programming language, but uses a simplified syntax and graphics programming model.

55Wilkinsons Grammar of GraphicsDescribes formalism for statistical graphics

Different choices in the design of formalism:non-relational data modeldifferent operators in table algebra

Further experience necessary to fairly evaluate differences between our formalisms

56ConclusionsNovel interface for rapidly constructing table-based graphical displays from multi-dimensional relational databases

A formalism for specifying complex graphics and tables

Interpretation of visual specifications as relational (SQL) queries and drawing operations.

Interactive exploration of large multi-dimensional databasesExpressive set of graphical displaysUses tables to organize multiple graphs on a display

57DiscussionAllows overlap between the relations that are divided into each pane of the Polaris display, unlike the basic Pivot Table model.

Allows more versatile computation of aggregates (e.g., medians and averages, in addition to sums).

Intuitive drag-and-drop interface, like that seen in Pivot TablesRemarksMerits:A cohesive architecture for coordinating visualization componentsFlexible and easy user interface, no programming neededSupport for visual query Good integration between query and visualization schemaShortcomings:Not an extensible architecture for the data analysis systemLimited support for coordinated data navigation (pan, zoom ) Lack of support for hierarchical data (fix: dot operator)

Possible ImprovementsGenerate database tables from a selected set of marks. Use selected mark in one display as the data input to another.

Integrate a table lens, instead of having to click a mark to view its details.

Exploring interaction techniques for navigating hierarchical structures of multi-dim databases.

Provide an adapter to link external data sources without explicitly storing data in the analysis system.

Polaris: Extended Formalism Additional formalism defined in papers*:specification of different graph typesencoding of data as retinal properties of marks in graphsdata transformationstranslation of visual specification into SQL queries* Relevant papers:Query, Analysis, and Visualization of Hierarchically Structured Data using PolarisChris Stolte, Diane Tang and Pat HanrahanProceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, July 2002. Polaris: A System for Query, Analysis and Visualization of Multi-dimensional Relational Databases (extended paper)Chris Stolte, Diane Tang and Pat HanrahanIEEE Transactions on Visualization and Computer Graphics, Vol. 8, No. 1, January 2002.

Formalism: ExtensionsCan mix graph types in single visualization:

Dot (.) operator: HierarchiesMany data warehouses have hierarchical dimensions:Time: Year, Month, DayLocation: Country, State, RegionDot (.) works like Nest (/) except it exploits the defined hierarchiesbased on semantics not tuples in databaseDemo

Questions?

http://graphics.stanford.edu/projects/polaris/


Recommended