+ All Categories
Home > Documents > ManyInsights: A Visual Analytics Approach to …...Yang Chen et al.: ManyInsights: A Visual...

ManyInsights: A Visual Analytics Approach to …...Yang Chen et al.: ManyInsights: A Visual...

Date post: 18-Jul-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
11
TSINGHUA SCIENCE AND TECHNOLOGY ISSNll 1007-0214 ll 06/08 ll pp171-181 Volume 18, Number 2, April 2013 ManyInsights: A Visual Analytics Approach to Supporting Effective Insight Management Yang Chen and Jing Yang Abstract: Although significant progress has been made towards effective insight discovery in visual analytics systems, there are few effective approaches for managing the large number of insights generated in visual analytics processes. This paper presents ManyInsights, a multidimensional visual analytics prototype that integrates several novel insight management approaches proposed by the authors in their previous work. These approaches include insight annotation, browsing, retrieval, organization, and association. This paper also reports a long-term case study that evaluated ManyInsights with a domain expert, realistic analytic tasks, and real datasets. Key words: visual analytics; insight management; multidimensional visualization 1 Introduction Visual analytics is an emerging research area that aims to solve complex and dynamic data analysis problems [1] . Its application domains range from homeland security, terrorism detection, to financial market analysis. Recently, numerous visual analytics approaches have been developed to facilitate sensemaking of complex, massive data. A vast number of insights are often captured from the data using these approaches. To effectively support analytic activities such as hypotheses evaluation and collaborative reasoning, Insight Management (IM), the process of annotating, retrieving, associating, and organizing insights, becomes essential in visual analytics approaches. This paper addresses the challenge of IM for analytic insights. Analytic insights, a category of insights among other types, are the most traditional sense of insights supported in visualization systems [2] . They come from exploratory analyses and consist of a body of data Yang Chen and Jing Yang are with the Department of Computer Science, University of North Carolina at Charlotte, Charlotte, NC 28223, USA. E-mail: [email protected]; [email protected] To whom correspondence should be addressed. Manuscript received: 2013-03-07; accepted: 2013-03-07 that has been given meaning through users’ analytic tasks [2] . Wehrend and Lewis [3] identified 11 low-level analytic tasks that can result in an analytic insight, such as classification and ranking. Significant effort has been directed towards managing analytic insights in visual analytics systems. However, as we stated in Ref. [4], most existing approaches suffer from the following problems: (1) Manual insight annotation is often required, such as manually posting insights [5] or attaching hand-drawn marks to the visualization views [6] . Manual annotation is time- consuming and reduces users’ interests in annotating insights. Moreover, manually generated annotations can be incomplete, imprecise, and hard to understand, which leads to difficulties in subsequent IM activities such as insight retrieval and exchange. (2) Most existing approaches require users to manually detect and organize relationships among insights, such as to manually associate related insights [7] . It is difficult to use manual approaches to handle complex sensemaking tasks where a large number of insights and multiple users are involved. (3) It is time consuming to search and reuse recorded insights with existing approaches, especially in an asynchronous collaboration environment. In such environments, constructing queries to fetch stored insights is often challenging, since different users may use various
Transcript
Page 1: ManyInsights: A Visual Analytics Approach to …...Yang Chen et al.: ManyInsights: A Visual Analytics Approach to Supporting Effective Insight Management 173 insight[4].According to

TSINGHUA SCIENCE AND TECHNOLOGYISSNll1007-0214ll06/08llpp171-181Volume 18, Number 2, April 2013

ManyInsights: A Visual Analytics Approach to Supporting EffectiveInsight Management

Yang Chen� and Jing Yang

Abstract: Although significant progress has been made towards effective insight discovery in visual analytics

systems, there are few effective approaches for managing the large number of insights generated in visual analytics

processes. This paper presents ManyInsights, a multidimensional visual analytics prototype that integrates several

novel insight management approaches proposed by the authors in their previous work. These approaches include

insight annotation, browsing, retrieval, organization, and association. This paper also reports a long-term case

study that evaluated ManyInsights with a domain expert, realistic analytic tasks, and real datasets.

Key words: visual analytics; insight management; multidimensional visualization

1 Introduction

Visual analytics is an emerging research area thataims to solve complex and dynamic data analysisproblems[1]. Its application domains range fromhomeland security, terrorism detection, to financialmarket analysis. Recently, numerous visual analyticsapproaches have been developed to facilitatesensemaking of complex, massive data. A vastnumber of insights are often captured from thedata using these approaches. To effectively supportanalytic activities such as hypotheses evaluation andcollaborative reasoning, Insight Management (IM),the process of annotating, retrieving, associating,and organizing insights, becomes essential in visualanalytics approaches.

This paper addresses the challenge of IM for analyticinsights. Analytic insights, a category of insights amongother types, are the most traditional sense of insightssupported in visualization systems[2]. They come fromexploratory analyses and consist of a body of data

� Yang Chen and Jing Yang are with the Department ofComputer Science, University of North Carolina at Charlotte,Charlotte, NC 28223, USA. E-mail: [email protected];[email protected]

�To whom correspondence should be addressed.Manuscript received: 2013-03-07; accepted: 2013-03-07

that has been given meaning through users’ analytictasks[2]. Wehrend and Lewis[3] identified 11 low-levelanalytic tasks that can result in an analytic insight, suchas classification and ranking.

Significant effort has been directed towards managinganalytic insights in visual analytics systems. However,as we stated in Ref. [4], most existing approachessuffer from the following problems: (1) Manualinsight annotation is often required, such as manuallyposting insights[5] or attaching hand-drawn marks tothe visualization views[6]. Manual annotation is time-consuming and reduces users’ interests in annotatinginsights. Moreover, manually generated annotationscan be incomplete, imprecise, and hard to understand,which leads to difficulties in subsequent IM activitiessuch as insight retrieval and exchange. (2) Mostexisting approaches require users to manually detectand organize relationships among insights, suchas to manually associate related insights[7]. It isdifficult to use manual approaches to handle complexsensemaking tasks where a large number of insightsand multiple users are involved. (3) It is timeconsuming to search and reuse recorded insights withexisting approaches, especially in an asynchronouscollaboration environment. In such environments,constructing queries to fetch stored insights is oftenchallenging, since different users may use various

Page 2: ManyInsights: A Visual Analytics Approach to …...Yang Chen et al.: ManyInsights: A Visual Analytics Approach to Supporting Effective Insight Management 173 insight[4].According to

172 Tsinghua Science and Technology, April 2013, 18(2): 171-181

terms to express similar meanings when manuallyannotating insights. The users may also have difficultiesin understanding insights recorded by others, since theannotation process is not well regulated.

The above challenges need to be addressed toachieve effective and efficient IM. Toward this goal,we have proposed a general IM framework[4] and aset of IM approaches[4, 8, 9] based on this framework,which are summarized in this paper. Our approachesaddress the challenges surrounding the concept offacts, namely the results of users’ low-level analytictasks[10]. Facts are essential components of insightsand convey the rich semantics of users’ analytictasks. From our observation[4], for the same typeof data (e.g., multidimensional data), users caneffectively classify most facts into a small number ofcategories, independent from the domains/applicationsand visualization tools. In addition, the same set ofinformation is often used to annotate facts in the samecategory. Therefore, visual analytics systems can semi-automatically annotate facts in a formalized format bypredicting what should be recorded for facts in popularcategories. Based on the formalized annotations,insights can be browsed, retrieved, associated, andorganized effectively.

ManyInsights is a prototype we have developedas a testbed of the above IM approaches formultidimensional data. Although the individual IMcomponents of ManyInsights, such as annotationand association, have been reported in our previouspapers[4, 8, 9], they have never been presented as a wholeto provide a full picture of how they work togetherfor effective visual reasoning. This paper presentsManyInsights as an integrated IM system. In addition,this paper extends previous work by presenting a long-term case study of ManyInsights conducted by a domainexpert with real datasets and real research tasks. Thecase study provides an in-depth understanding of howthe proposed IM approaches work together to facilitateexploratory data analysis.

2 Related Work

Experiments conducted by Robinson[11] provideevidence that annotating, organizing, and sharing visualanalysis results is critical for successful collaborativereasoning. There are a few visualization systems thatallow users to manage insights in visual analyticsprocesses. For example, ManyEyes[5] provides a

discussion forum, where users can share their findingsor free thoughts by manually posting comments. A URLbookmarking mechanism is used to point back fromthe comments to the associated views, so that users canrevisit and evaluate their findings. Ellis and Groth[12]

proposed allowing users to share insights throughannotations in a collaborative data visualization. Theusers need to create annotations manually and searchfor insights of interest manually. Shrinivasan and vanWijk[7] allow users to take notes on analytic artifactssuch as findings, hypotheses, and causal relations. Thenotes can be organized into groups to form a highlystructured argumentation.

Recently, initial efforts have been made towardsmanaging insights by integrating automatic analysis andvisual exploration techniques. The Nugget ManagementSystem proposed by Yang et al.[13] allows usersto extract, refine, and record nuggets (subsets ofmultivariate data) with the help of automatic analysistechniques. Statistical information, such as the numberof data records included and the average value oneach dimension can be automatically computed andattached to a discovered nugget in addition to manualannotations given by users. Currently this systemsupports the discovery and annotation of clustersin multivariate data. Shrinivasan and van Wijk[14]

proposed an approach for automatically associatingfindings based on users’ exploration action histories,such as zooming and panning. However, it proveddifficult to summarize insights if the explorationhistories consist of exploration steps with little semanticmeaning. In addition, the large number of explorationsteps toward each insight may hinder a system fromeffectively organizing and associating large numbers ofinsights.

3 General Insight Management Framework

The IM approaches we propose intend to support thefollowing utility goals:

� Goal 1: To keep found things found[15], i.e., tocapture, annotate, retrieve, and reuse insights in avisual analytics process;

� Goal 2: To reveal the relationships among insightsand integrate them for hypothesis generation andevaluation.

To develop effective IM approaches, a three-component model is proposed to describe an

Page 3: ManyInsights: A Visual Analytics Approach to …...Yang Chen et al.: ManyInsights: A Visual Analytics Approach to Supporting Effective Insight Management 173 insight[4].According to

Yang Chen et al.: ManyInsights: A Visual Analytics Approach to Supporting Effective Insight Management 173

insight[4]. According to this model, each insightconsists of a fact extracted from data under analysis,such as an outlier, a pattern, or a relationship, aknowledge base upon which the fact is evaluated,and subjective evaluations of the fact. In a typicalcase, an analyst discovers a fact as a result of ananalytic task during an interactive visual explorationprocess. The analyst then evaluates the fact against theknowledge base to see if it is a significant and reliablepiece of evidence that can be used in the sensemakingprocess. The fact, the knowledge base applied, and theevaluations construct an insight for the sensemakingprocess.

Among the three components, the knowledge baseis difficult to handle using a general approach, since itvaries significantly between datasets, applications, andanalysts. Subjective evaluations also must be providedby users. On the other hand, the types of facts that canbe discovered from a certain type of data are mostlypredictable and domain independent. Therefore, webelieve that general approaches can be developed toallow visualization users to effectively manage facts. Ageneral IM framework is proposed based on thisidea. This framework can be extended to support IMin a variety of application domains by integrating theirknowledge bases.

The general IM framework is shown in Fig. 1. Thefoundation of all IM approaches is a fact taxonomythat summarizes information about fact categories, factattributes, and the relations in which an insight canbe associated with another. The taxonomy serves threeimportant functions. First, it enhances the automationof insight annotation. After a user discovers a fact anddecides its category, the computer can automaticallycollect and extract information about the fact followingthe taxonomy to capture its semantics. The user onlyneeds to provide the knowledge base and subjectiveevaluations to complete an insight annotation. Second,since the information obtained for each category of factsis predictable, annotations can be highly formalizedby the computer. This greatly enhances the automationof other IM activities, such as insight browsing,retrieval, association, and organization. For example,insight clusters can be automatically constructed sincethe computer can capture correlations among theformalized annotations following the taxonomy. Third,formalized annotations enable effective communicationamong multiple users and multiple systems.

Fig. 1 The general IM framework.

4 ManyInsights

ManyInsights is a fully working prototype ofthe general IM framework for multidimensionaldatasets. Similar to existing online visualizationapplications, such as ManyEyes[5], ManyInsightsallows users to upload multidimensional datasets,create visualizations (e.g., scatter plot and parallelcoordinates), and explore the visualizations forinsights. Beyond these commonly supported tasks,ManyInsights provides rich IM functions.

Following the framework, a taxonomy was firstconstructed to categorize facts from multidimensionaldata and summarize their essential attributes[4]. Thetaxonomy consists of 12 categories of facts, namelyvalue/derived value, distribution, difference, extreme,rank, categories, cluster, outliers, association, trend,compound fact, and meta fact. For each category, thecontent attributes (e.g., dimensions and cluster radiusfor cluster) and context attributes (e.g., dissimilaritybetween clusters) are identified.

Upon the taxonomy, a variety of automatic or semi-automatic IM approaches, as identified in Section 3, areprovided. The following scenario describes how theywork together to facilitate a sensemaking process:

(1) Users visually explore one or more datasets inthe visualization for insights. After they find an insight,they highlight the data of interest, select the type ofthe fact, and enter the knowledge base and subjectevaluations. ManyInsights will automatically collectcontent and contextual information of the fact anduse them together with other user input to generate aformalized insight annotation. The annotation is storedin an insight database, which can be shared by manyusers in a collaborative analysis environment. Pair-wise insight correlations can be calculated between two

Page 4: ManyInsights: A Visual Analytics Approach to …...Yang Chen et al.: ManyInsights: A Visual Analytics Approach to Supporting Effective Insight Management 173 insight[4].According to

174 Tsinghua Science and Technology, April 2013, 18(2): 171-181

formalized insight annotations.(2) Later on, the users can retrieve and browse

insights annotated by themselves or other users from thedatabase via a faceted search interface. They can alsobrowse the insights in the visualization using scentedinsight browsing.

(3) After the users retrieve insights of interest, theycan interactively explore them in an automaticallygenerated dynamic insight clustering display. This viewreveals insight clusters consisting of closely relatedinsights, the discovery history of these clusters, andtheir semantics. According to the drifting interest of theusers, correlations among the insights can be calculateddifferently to reveal different clusters.

(4) The users create hypotheses and associate theinsights with the hypotheses. Insights associated withone or more hypotheses can be examined in detail inthe region graph for visual sensemaking.

(5) The users annotate their key findings andhypotheses for future exploration.

4.1 Insight annotation

According to the fact taxonomy, the followinginformation is recorded for each insight annotated inManyInsights: (1) fact information, such as datasetnames, types of facts (e.g., clusters, outliers, rank,correlation, and etc.), relevant dimensions and dataitems, and essential characteristics of the facts (suchas the mean of clusters); (2) user-generated semanticinformation, such as hypotheses associated with theinsights, tags recording the knowledge base, and subjectevaluations; and (3) meta information, such as namesof authors who annotated the insights and timestampsrecording when the insights were annotated.

A semi-automatic insight annotation approach namedClick2Annotate[8] is integrated in the visualization tocollect the above information in ManyInsights. Usingpre-defined or customized annotation templates fortypical fact types, users can annotate most insightswith a few mouse clicks. More specifically, whenusers discover a fact of interest during the visualexploration, such as the cluster shown in Fig. 2a-1,they brush the relevant data, judge the type of fact,and select the template for this type (see Fig. 2a-3). Following the template, the system automaticallyretrieves information about the fact and creates aformalized annotation. The annotation is then visuallypresented to the users so they can review and completeit by adding domain specific information and their

Fig. 2 Semi-automatic annotation generation usingClick2Annotate.

evaluations (see Fig. 2b).

4.2 Insight retrieval

As the number of annotated insights grows larger,effective insight retrieval becomes essential. Inspiredby faceted search[16] used in many online markets,we propose a faceted insight search approach[8]. Inparticular, a set of common attributes shared bymultiple formalized annotations, including author, time,title, fact type, dataset, dimension, and tag, areused as faceted filters for searching for insights inManyInsights. Users can view insights in any orderusing these filters through the faceted search interfaceprovided by ManyInsights (see Fig. 3). Each retrievedinsight is represented as an annotation card, whichsummarizes the insight using a visualization thumbnailand a short sentence that captures the essentialinformation of the insight. For example, two annotationcards are shown on the right of Fig. 3.

Page 5: ManyInsights: A Visual Analytics Approach to …...Yang Chen et al.: ManyInsights: A Visual Analytics Approach to Supporting Effective Insight Management 173 insight[4].According to

Yang Chen et al.: ManyInsights: A Visual Analytics Approach to Supporting Effective Insight Management 175

Fig. 3 The faceted insight search interface. Users can searchinsights by multiple attributes and browse them in annotationcards.

4.3 Insight revisiting

To allow users to revisit insights, ManyInsightsprovides scented insight browsing. It is an “annotateonce, appear anywhere” approach[17] that allows usersto access relevant insights conveniently during theirvisual exploration process. If a user turns on the scentedbrowsing mode, insight flags, each of which representsan insight, are attached to the visualizations, not onlythe views where the insights were captured, but alsoother views where the relevant data items/dimensionsof the insights can be observed (see Fig. 4). Userscan revisit an insight by clicking its flag to highlightits relevant data in a visualization and display itsannotation card. To reduce clutter, the users caninteractively select the insights they want to flag usingthe faceted insight retrieval approach described above.

4.4 Insight correlation exploration

In ManyInsights, users can interactively explore thecorrelations among insights. The correlation betweentwo insights is calculated in the following way: (1) A

Fig. 4 Scented insight browsing. Users can browse aninsight by clicking the flag associated with it.

similarity measure ranging from 0 to 1 is calculatedfor each attribute of the insights. For example, if theinsights were created by the same author, they areassigned a similarity value of 1 on authorship; if theyare related to the same set of dimensions, they areassigned a similarity value of 1 on dimensions; (2)Each similarity measure is assigned a weight accordingto user input. For example, if the users want toassociate the insights by authorship only, they can setthe weight of authorship to 1 and the weights of allothers to 0; (3) The weighted sum of the similaritymeasures is calculated according to the measures andthe weights. The result is the correlation of the twoinsights. By adjusting the weights, users can group andassociate insights according to a variety of interests,such as by author, by shared dimension, or by fact type.

After insight correlation, two coordinatedvisualization views, the dynamic insight clusteringdisplay and the region graph, are provided to allowusers to interactively group, associate, and compareinsights.

4.4.1 Dynamic insight clustering displayThe dynamic insight clustering display[9] revealscorrelations among insights by placing relatedinsights close to each other. It also reveals thetemporal evolution of the annotation activities throughcontrollable animations. Figure 5–1 shows 90 insightsin the display. Insights are represented as particleswith a variety of shapes indicating their fact types(see the shape legend in Fig. 5–1). The luminanceof the particles indicates the age of the insights (thedarker, the older). Insights are automatically clusteredaccording to their correlations (refer to Ref. [18] fordetails of the underlying force-based dynamic system).

Labels are automatically generated to convey thesemantics of the insight clusters (see Fig. 5–3 for anexample). Users can interactively control which typesof insight contents to be included in the labels. Totrack insights with keywords of interest (keywords aremeaningful words in the annotations, such as tags,dimension names, and data item names), users assigncolors to them through the keyword table (see Fig. 5–5). An insight can have multiple colors if it containsmultiple keywords of interest.

Users can dynamically cluster the insights in thisview to reflect their current exploration interest. Forexample, by setting the tag weight to 1 throughthe star glyph (see Fig. 5–4), insights are grouped

Page 6: ManyInsights: A Visual Analytics Approach to …...Yang Chen et al.: ManyInsights: A Visual Analytics Approach to Supporting Effective Insight Management 173 insight[4].According to

176 Tsinghua Science and Technology, April 2013, 18(2): 171-181

Fig. 5 The dynamic insight clustering display. The weight of tag similarity is set to 1 and the weights of other similarity measuresare set to 0. Each insight is represented by a shaped particle, colored according to the keywords it contains. Selected insights arehighlighted by orange halos.

by their tags. Users can also interactively adjust theimportance of keywords from the keyword table (seeFig. 5–5) to cluster the insights by keywords ofinterest. They can play animations to examine theannotation history. During the animation, insights arecontinuously injected into the display in chronologicalorder of their time stamps. The layout gradually evolvesto reveal how clusters are formed and evolving overtime. Users can also examine the temporal distributionof the annotations in the timeline, in which insights arerepresented as bars along a time axis.

4.4.2 Region graphThe region graph[9], which was inspired by the substrategraph[19], presents the relationships among a groupof insights in detail. It also allows the users tocompare two groups of insights for shared or distinctinformation. The region graph can have one or twocolumns, each for an insight group to be examined. InFig. 6, the details and the relations among insights inthe same group are examined. In Fig. 7, two groups ofinsights are compared and associated.

In the region graph, insights are representedas particles in the same way as the dynamicinsight clustering display. Insights are placed in non-overlapping, user-defined content substrates based ontheir contents. For example, in Fig. 6, each substrateis a rectangle with a distinct color. It represents a

Fig. 6 A region graph examining a group of insights.

dataset whose name is displayed underneath it. Asubstrate is further divided into rows, each of whichrepresents a dimension in the dataset. The labels of thedimensions are displayed on the left of the rows. Onlydatasets and dimensions appearing in the insights aredisplayed. Each insight is displayed in one or more rowsaccording to the dataset and the dimensions it is relatedto. Its horizontal position is tied to its age. The oldestinsights are on the right and the newest ones are on theleft. When an insight is displayed in multiple rows (ithappens when the insight is related to multiple datasets

Page 7: ManyInsights: A Visual Analytics Approach to …...Yang Chen et al.: ManyInsights: A Visual Analytics Approach to Supporting Effective Insight Management 173 insight[4].According to

Yang Chen et al.: ManyInsights: A Visual Analytics Approach to Supporting Effective Insight Management 177

Fig. 7 Comparing and associating two insight groups using a region graph. Each column represents an insight group. Shareddatasets and dimensions are indicated by the same colors and labels. Shared insights are connected by red, undirected dot links.

or multiple dimensions), the topmost particle is drawnin solid and the others are drawn with a blurring effect.

The region graph represents insight relationshipsusing directed links between insight particles. Insightscan have multiple types of relationships, such as sharedtags or shared data items. They are distinguishedusing colors of the links. To reduce clutter, users caninteractively turn on/off a type of relationship. Thethickness of a link indicates the similarity measure ofthe relationship. Users can hover their mouse over a linkto examine the relationship in detail. For example, inFig. 6, two insights are connected since they share thesame data item “Mississippi”.

Users can select dimensions from the region graphto open a multidimensional display. Within thedisplay, related insights will be highlighted, with flagsindicating their types. In this way, users can explorethe visualization for new insights or examine existinginsights for refinement. This function is important inpromoting new insights and hypotheses.

4.5 Hypothesis generation

A crucial task of visual analytics is to test thehypotheses using insights and save hypotheses forproblem solving and decision making. ManyInsightsallows users to record their hypotheses through a specialtype of insights, namely the hypothesis insights. A

hypothesis insight contains a tag and free notesentered by users, as well as pointers to relevantinsights. To retrieve saved hypotheses, users can searchfor hypothesis insights. To evaluate a hypothesis orcompare two hypotheses, the users can easily load therelevant insights into the region graph for comparisonand association.

5 Case Study

In our previous work[8, 9], we conducted a set offormal user studies and case studies to evaluatethe individual functions of ManyInsights, includinginsight annotation, insight association, and insightclustering. However, it is also necessary to evaluate howthese functions work together to benefit domain expertsin their real-world analytic tasks. Toward this goal, weconducted a long-term case study with a domain expertusing real datasets and real analytic tasks. The studywas focused on the domain expert’s IM activities in along-term data exploration process.

A researcher with 6 years research experience onenvironmental policy participated in the study. He wasinterested in analyzing energy-related carbon dioxide(CO2) emissions in U.S., so he used ManyInsights for 8weeks to perform data analysis on relevant datasets.

Page 8: ManyInsights: A Visual Analytics Approach to …...Yang Chen et al.: ManyInsights: A Visual Analytics Approach to Supporting Effective Insight Management 173 insight[4].According to

178 Tsinghua Science and Technology, April 2013, 18(2): 171-181

5.1 Tasks and datasets

In the case study, the researcher conducted two analytictasks. The first task was to identify which states havethe highest CO2 per capita emissions and why they arehigher than other states. The second task was to providerecommendations to reduce the emissions for the stateswith high CO2 emissions.

The researcher provided his own data whichincluded 22 multidimensional datasets. The numberof dimensions ranged from 4 to 32. Table 1 providesa partial list of these datasets as well as their keydimensions.

5.2 Method

The main methods of the case study were participatoryobservations and interviews. During the 8-week casestudy, weekly meetings were conducted betweenthe researcher and an instructor, each of whichincluded a 2-hour data exploration session. A trainingsession was conducted before the data explorationsession in the first meeting, in which the instructorintroduced ManyInsights to the researcher and taughthim how to use it. In each data exploration session,the researcher was asked to use ManyInsights toconduct the two tasks. Four visualizations (parallelcoordinates, scatter plot, bar chart, and pie chart) wereused for data exploration based on the researcher’srequest. The instructor observed the process andprovided instructions when the researcher encounteredany problems. The analytical artifacts generated

Table 1 Partial list of datasets used in the case study.

Dataset Example dimensionsU.S. per capita carbon dioxideemission (2005)

Per capita emissions

U.S. census (2005) Population, income percapita, age, educationalattainment, housing units,area, density

U.S. transportation fuel (2005) Highway use, non-highway use, total use

U.S. transportation fuel use andemission (2005)

Transportation fuelemission, fuelconsumption

U.S. average electric poweremissions (2005)

Electricity emission,generation

U.S. electricity consumption bysector (2005)

Residential, commercial,industrial

U.S. average householdemission by state (2005)

Household fuel emission,residential

by the researcher were collected, such as insightannotations, hypotheses, and screenshots of importantvisualizations. After each data exploration session, theinstructor interviewed the researcher to collect hisfeedback regarding the system and to understand hisanalysis process and findings.

5.3 Observed analysis process

The researcher began by exploring the 2005 U.S. percapita CO2 emissions dataset. He identified severalstates with extremely high per capita emission,such as “Alaska” and “Texas”. He used the outlierand rank templates to record them. Next, theresearcher focused on the visual exploration ofthree datasets, namely transportation fuel use andemission, electric power emissions, and averagehousehold emission. They contained important energyconsumption information. For each dataset, theresearcher identified the states that ranked the highestand the lowest in a variety of dimensions andannotated them accordingly. The captured insightswere then visually explored in the region graph. Theresearcher quickly identified several dimensions ofinterest from the energy consumption datasets, suchas “transportation fuel emission” and “householdfuel emission”. The insights about these dimensionsincluded states that also appeared in the insights abouthigh per capita overall emission. The researcher calledthese dimensions the key emission categories.

After identifying the key emission categories,the researcher explored more datasets related toeach category to investigate factors that caused theemission. In this process, he focused on insights aboutdimension correlations and explored these insightsusing the region graph, as shown in Fig. 8. The regiongraph helped the researcher develop a global pictureof the factors from the multiple datasets. For example,the researcher captured several strong correlations inthe transportation fuel use and emission dataset (e.g.,“fuel consumption” and “transportation emission”), thecensus dataset (e.g., “population density” and “percapital fuel consumption”), and the transportation fueldataset (e.g., “fuel price” and “fuel consumption”). Byassociating these insights in the region graph (seeFig. 8) and examining the relationships in detail,the researcher concluded that low population densityareas and low fuel prices may cause more highwaydriving and fuel use, which would account for highertransportation emissions. The researcher commented

Page 9: ManyInsights: A Visual Analytics Approach to …...Yang Chen et al.: ManyInsights: A Visual Analytics Approach to Supporting Effective Insight Management 173 insight[4].According to

Yang Chen et al.: ManyInsights: A Visual Analytics Approach to Supporting Effective Insight Management 179

(1)

(2)

(3)

Fig. 8 Exploring the dimension correlations in the regiongraph to identify the key emission factors.

that the region graph clearly summarized the dimensionrelationships and allowed him to reach conclusionsquickly.

As the data exploration continued, many insightswere captured and annotated. The researcherextensively used the faceted search interface andthe dynamic insight clustering display to keep theawareness of these previous analysis results andguide the current exploration. More specifically, whenexploring a new dataset, the researcher frequently usedthe faceted search interface to identify dimensions, dataitems, and tags that were most frequently captured inprevious analysis sessions. This important informationwas then used to aid the analysis of the current datafor new hypotheses. He also grouped insights in thedynamic insight clustering display. He often assignedhigh importance to the popular items identified throughthe labels in cluster view. In this way, the researchercould easily inspect the insights related to importantitems and revisit their visualizations for new insights.

Toward the end of the study, the researcher utilizedthe dynamic insight clustering display and the regiongraph to review the captured insights and findevidence that could explain the high emission of thestates. The dynamic insight clustering display allowedthe researcher to explore the vast amount of insightsin a divide-and-conquer manner. More specifically,the researcher first grouped the insights by the statesthey involved. After several clusters were observed, headjusted the attribute importance to find subsets that

contained interesting dimensions or tags within eachcluster. By partitioning the clusters into smaller groups,the researcher could flexibly explore and comparethem in the region graph, in which the differencesbetween states in various dimensions could be easilyidentified. Figure 9 shows an example where a subsetabout “Texas” and a subset about “California” werecompared side-by-side. All the insights were relatedto transportation. By exploring the links and revisitingthe insights in the visualization, the researcher easilyidentified significant differences between “Texas” and“California” in “registered vehicles” and “publictransportation”. He also quickly captured the differenceof “fuel price” in “Texas” and “California”. As aresult, the researcher concluded that these factors couldexplain why “Texas” had much higher transportationemission than “California” even though they had similarpopulation.

To conduct the second task, the researcher firstreviewed all the correlation insights in the regiongraph and identified controllable factors amongthem. For example, “average gas price” and “share ofpublic transportation” were important factors affectingtransportation emission and could be controlled bypolicies. The researcher grouped all correlation insightsthat contained the controllable factors and associatedthem with the insights of states with high emission inthe region graph. In this way, the researcher quicklydetermined the controllable emission factors for thesestates and made the recommendation accordingly. Forexample, if a state with high transportation emissionhad low fuel prices, the researcher would suggestincreasing the fuel price to reduce the transportationemission for this state. In this case study, the researcherannotated 147 insights and created 15 hypotheses intotal.

Fig. 9 Comparing two insight subsets about “Texas” (left)and “California” (right) using the region graph. All theinsights contain the keyword “transportation”.

Page 10: ManyInsights: A Visual Analytics Approach to …...Yang Chen et al.: ManyInsights: A Visual Analytics Approach to Supporting Effective Insight Management 173 insight[4].According to

180 Tsinghua Science and Technology, April 2013, 18(2): 171-181

5.4 Feedback

Overall, the researcher reported that he enjoyedthe case study. He also showed enthusiasm forManyInsights. He commented that the IM functionsprovided in ManyInsights incorporated well into hisnatural analysis flow and that they helped drive himto perform in-depth analyses. He particularly likedthe ease with which he was able to conduct semi-automatic insight annotation, grouping, and associationin a single system. He commented that previously hehad to use multiple tools, such as a text editor, tables,and organization charts to manually record and manageinsights. It was time-consuming to transform and sharethe results among these tools. ManyInsights freed himfrom these tedious tasks so that he could spend moretime on analyzing important discoveries, detectingthe hidden relationships, and conducting reasoningtasks. Moreover, the researcher was impressed by theinteractivity and visual interfaces of ManyInsights, suchas visually grouping, associating, and interactivelybrowsing insights.

Regarding to the specific components and functions,the researcher commented that the semi-automaticannotation approach was very useful and the predefinedtemplates could fulfill his annotation needs. Theresearcher particularly liked the hypothesis generationfunction. He commented, “Previously, I would haveto use the text editor to record the hypotheses andmanually associate the findings to the hypotheses. Itrequired much more efforts and I could easily lose trackof the associated findings.” Moreover, the researcherpointed out that the tag function was helpful, especiallyfor searching and organizing insights.

The researcher commented that the faceted searchinterface was intuitive and enjoyable to use. In thetraining session, he showed a great interest to theinterface and grasped it with little instruction. In theanalysis process, the researcher was able to examinethe most frequent items of each attributes throughthe interface, which offered great convenience. Hecommented, “It helps me quickly keep an awareness ofthe analysis state at the moment, such as which datasetshad been explored adequately and which one requiresmore explorations. Manually obtaining this informationcould require many efforts and distract me from theongoing analysis.” The scented insight browsing wassimilarly useful, “Every time I revisited a visualization

I would first examine the small indicators to checkwhat I had [discovered] here. The function led to manyunexpected findings and prevented me from makingredundant annotations.”

The researcher pointed out that the region graph wasincredibly useful and it was among the most frequentlyused tools during the study. He commented, “Overall,the region graph is a wonderful tool for summarizinglarge numbers of insights and drawing conclusionsfrom them. The layout, the node placement andrepresentation, and the links help me easily interpretthe interrelationships and form a comprehensiveunderstanding of the insights I captured.” Theresearcher appreciated the feature of simultaneouslycomparing the insight groups. He said, “The visualcomparison is extremely useful for conducting the state-to-state comparison task. It allowed me to identify thedifferences and similarities quickly and effectively.”

The researcher also suggested potential futureimprovements. For example, he emphasized theimportance of associating insights involvingdimensions at different levels of a dimensionhierarchy. For example, a yearly emission trendmight provide important context for analyzing monthlyor quarterly emissions. The researcher also desireda dynamic update function for the region graph sothat the newly captured insights can be dynamicallydisplayed and associated with existing insights.

6 Conclusions

In this paper, we present ManyInsights, amultidimensional visual analytics prototype thatsupports effective insight annotation, browsing,retrieval, organization, and association. A case studyof how a domain expert used ManyInsights to managehis insights in a real visual analytics process isreported. The study provided strong support for theusefulness of ManyInsights and its underlying IMframework.

Acknowledgements

The work was partially supported by the US NationalScience Foundation (Nos. 0946400 and 0915528) as wellas the DHS Visual Analytics for Command, Control, andInteroperability (VACCINE) Center of Excellence, underthe auspices of the SouthEast Regional Visual AnalyticsCenter.

Page 11: ManyInsights: A Visual Analytics Approach to …...Yang Chen et al.: ManyInsights: A Visual Analytics Approach to Supporting Effective Insight Management 173 insight[4].According to

Yang Chen et al.: ManyInsights: A Visual Analytics Approach to Supporting Effective Insight Management 181

References

[1] J. Thomas and K. Cook, Illuminating the Path: TheResearch and Development Agenda for Visual Analytics,Washington DC: IEEE Computer Society Press, 2005.

[2] Z. Pousman and J. Stasko, Casual informationvisualization: Depictions of data in everyday life,IEEE Transactions on Visualization and ComputerGraphics, vol. 13, no. 6, pp. 1145-1152, 2007.

[3] S. Wehrend and C. Lewis, A problem-orientedclassification of visualization technique, in Proc. 1stConference on Visualization, 1990, pp. 139-143.

[4] Y. Chen, J. Yang, and W. Ribarsky, Toward effectiveinsight management in visual analytic systems, in IEEEPacific Visualization Symposium, 2009, pp. 49-56.

[5] F. B. Viegas, M. Wattenberg, F. van Ham, J. Kriss, andM. M. McKeon, ManyEyes: A site for visualization atinternet scale, IEEE Transactions on Visualization andComputer Graphics, vol. 13, no. 6, pp. 1121-1128, 2007.

[6] J. Heer, F. B. Viegas, and M. Wattenberg, Voyagersand voyeurs: Supporting asynchronous collaborativeinformation visualization, in Proc. ACM SIGCHIConference on Human Factors in Computing Systems,2007, pp. 1029-1038.

[7] Y. B. Shrinivasan and J. van Wijk, Supporting theanalytical reasoning process in information visualization,in Proc. ACM SIGCHI Conference on Human Factors inComputing Systems, 2008, pp. 1237-1246.

[8] Y. Chen, S. Barlowe, and J. Yang, Click2Annotate:Automated insight externalization with rich semantics,in IEEE Conference on Visual Analytics Science andTechnology, 2010, pp. 155-162.

[9] Y. Chen, J. Alsakran, S. Barlowe, J. Yang, and Y. Zhao,Supporting effective common ground construction inasynchronous collaborative visual analytics, in IEEEConference on Visual Analytics Science and Technology,2011, pp. 23-28.

[10] R. Amar, J. Eagan, and J. Stasko, Low-level componentsof analytic activity in information visualization, inProc. IEEE Symposium on Information Visualization,2005, pp.111-147.

[11] A. C. Robinson, Collaborative synthesis of visual analyticresults, in Proc. IEEE Symposium on Visual AnalyticsScience and Technology, 2008, pp. 67-74.

[12] S. E. Ellis and D. P. Groth, A collaborative annotationsystem for data visualization, in Proc. Working Conferenceon Advanced Visual Interfaces, 2004, pp. 411-414.

[13] D. Yang, Z. Xie, E. Rundensteiner, and M. Ward,Managing discoveries in the visual analytics process, ACMSIGKDD Explorations Newsletter, vol. 9, no. 2, pp. 22-29,2007.

[14] Y. B. Shrinivasan and J. van Wijk, Supporting explorationawareness in information visualization, IEEE ComputerGraphics Applications, vol. 29, no. 5, pp. 34-43, 2009.

[15] W. Jones, H. Bruce, and S. Dumais, Keeping found thingsfound on the web, in Proc. ACM CIKM InternationalConference on Information and Knowledge Management,2001, pp. 119-126.

[16] K. P. Yee, K. Swearingen, K. Li, and M. Hearst, Facetedmetadata for image search and browsing, in Proc. ACMSIGCHI Conference on Human Factors in ComputingSystems, 2003, pp. 401-408.

[17] L. Hong and E. Chi, Annotate once, appear anywhere:Collective foraging for snippets of interest using paragraphfingerprinting, in Proc. ACM SIGCHI Conference onHuman Factors in Computing Systems, 2009, pp. 1791-1794.

[18] J. Alsakran, Y. Chen, Y. Zhao, J. Yang, and D. Luo,STREAMIT: Dynamic visualization and interactiveexploration of text streams, in Proc. IEEE Symposium onPacific Visualization, 2011, pp. 131-138.

[19] B. Shneiderman and A. Aris, Network visualization bysemantic substrates, IEEE Transactions on Visualizationand Computer Graphics, vol. 12, no. 5, pp. 733-740, 2006.

Yang Chen is a PhD student in theComputer Science Department atthe University of North Carolinaat Charlotte. His research interestsinclude visual analytics and informationvisualization. He received his BSin computer science from WuhanUniversity.

Jing Yang is an associate professorin the Computer Science Departmentat the University of North Carolinaat Charlotte. Her research interestsinclude visual analytics and informationvisualization on high dimensional data,graphs, and text documents. She receivedher PhD in computer science from WPI.


Recommended