Generating Personalized Spatial Analogies for Distances...

Generating Personalized Spatial Analogies forDistances and Areas

Yea-Seul KimiSchool | DUB

University of [email protected]

Jessica HullmaniSchool | DUB

University of [email protected]

Maneesh AgrawalaComputer Science Department

Stanford [email protected]

Figure 1. A text article containing a distance, alongside three personalized spatial analogies generated by our system for a user in San Jose, Seattle, andChicago.

ABSTRACTDistances and areas frequently appear in text articles. How-ever, people struggle to understand these measurements whenthey cannot relate them to measurements of locations thatthey are personally familiar with. We contribute tools for gen-erating personalized spatial analogies: re-expressions thatcontextualize spatial measurements in terms of locations withsimilar measurements that are more familiar to the user. Ourautomated approach takes a user’s location and generates apersonalized spatial analogy for a target distance or area us-ing landmarks. We present an interactive application that tagsdistances, areas, and locations in a text article and presentspersonalized spatial analogies using interactive maps. Wefind that users who view a personalized spatial analogy mapgenerated by our system rate the helpfulness of the informa-tion for understanding a distance or area 1.9 points higher(on a 7 pt scale) than when they see the article with no spa-tial analogy and 0.7 points higher than when they see genericspatial analogy.

Author KeywordsDistance; area; landmark; locator map; spatial analogy;information visualization.

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full cita-tion on the first page. Copyrights for components of this work owned by others thanACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-publish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected]’16, May 07 - 12, 2016, San Jose, CA, USA2016 © ACM. ISBN 978-1-4503-3362-7/16/05 ... $15.00.DOI: 10.1145/2858036.2858440

ACM Classification KeywordsH.5.m. Information Interfaces and Presentation (e.g. HCI):Miscellaneous

INTRODUCTIONPeople are frequently confronted with spatial measurementsin news articles and other text documents. For example, anews article might mention the distance between the site ofa plane crash and a nearby village (4.3 mi), or the squarefootage of a new prison (27,000 ft2). Unfortunately, spatialreferences like these are difficult for people to understandand reason about. People often struggle with accurately in-terpreting numbers [37, 42] and find it challenging to judgedistances and areas that they cannot easily relate to their ex-periential understanding of space [17, 21].

One strategy that authors use to directly relate new distancesor areas to people’s existing knowledge is to manually con-textualize the measurement in terms of well known locations.For example, 4.3 miles might be conveyed as about the dis-tance between the Empire State Building and the BrooklynBridge, or 420 acres as about half the size of Central Park.Such analogies can improve people’s understanding of mea-surements by conveying the new information through a refer-ence measurement that is well-known by the user [34, 35].

However, for a spatial analogy to aid cognition, readers musthave prior knowledge about the locations and/or areas of thespecific landmarks used in the analogy. While some land-marks may be recognized across populations, experientialknowledge of landmarks varies based on individualized fac-tors like peoples’ proximity to them [14, 21]. For example,

the analogy of 4.3 miles as about the distance between theEmpire State Building and the Brooklyn Bridge, or 420 acresas half the size of Central Park, only helps people that havegood spatial knowledge of New York City. For a Detroit na-tive a better analogy for 4.3 mi is about the distance from thePackard Plant to the Greektown casino. But, for an author topersonalize such analogies to individual users would requireknowing which landmarks the individual is familiar with –knowledge that is difficult to acquire.

We contribute tools for generating personalized spatialanalogies: re-expressions that contextualize spatial measure-ments in terms of one or more locations with similar measure-ments but which are more familiar to a user. We first identifycriteria for effective personalized spatial analogies, includingthe familiarity of the landmarks and how close the landmark’sdistance or area is to the target distance or area that is be-ing contextualized. We present a set of automated tools thattake a user’s location and generate personalized spatial analo-gies for a target distance or area. Our approach generates theanalogies by identifying landmarks that are similar to the tar-get measurement but more familiar to user. We present aninteractive personalized spatial analogy reader tool that tagsdistances, areas, and locations in a text article, then appliesour automated algorithm to generate a personalized spatialanalogy for each measurement. Our application presents thepersonalized spatial analogies for each target measurementvia interactive maps (Fig. 1, 9).

We demonstrate the usefulness of our tools through a userstudy in which users are presented with text articles contain-ing distances and areas. We find that users who view a per-sonalized spatial analogy map generated by our system ratethe helpfulness of the article content 1.9 points higher (on a7 pt scale) for understanding the distance or area than whenthey see no spatial analogy and 0.7 points higher than whenthey see the article with generic spatial analogy.

RELATED WORKOur work builds on prior research in three main areas; (1)studies in psychology and geography of how people reasonabout spatial information and the importance of landmarksin mental maps, (2) automated techniques for providing ge-ographical context for locations, and (3) techniques for ex-tracting landmarks from text and images.

How People Reason About Spatial MeasurementsCognitive geographers have described how mental models ofspace are metrically distorted: people rapidly become lessspatially accurate and remember fewer details as locations getfurther away from the areas where they spend the most time(e.g., their home neighborhood) [14, 21, 45]. As a result,the locations that are encoded in the mental maps of differentpeople may differ considerably. Thus, our approach for gen-erating personalized spatial analogies relies on the distance ofa user to a location as one proxy for their familiarity with thatlocation.

“Landmarks” are salient reference nodes in a person’s mentalmap of an environment that are distinctive from other pointsand therefore remembered more easily [33, 19, 14, 36, 43,

10]. Cognitive geographers describe several specific waysin which landmarks can be significant to people [33, 19, 14,36, 10, 38, 29]. A landmark may be generally significantacross a wide group of people for historical, spatial, or vi-sual reasons [33, 10]. For example, the St. Louis GatewayArch is significant for visual reasons as the largest arch in theworld and taller structure than the surrounding St. Louis land-scape and for historical reasons as a memorial to westwardexpansion. A landmark may also be personally significant toan individual based on his or her particular experiences withit [10]. For example, a neighborhood Starbuck’s coffee shopmight be significant to a resident of the neighborhood becauseshe visits it regularly. Our approach considers both forms oflandmark significance in selecting the landmarks to use in apersonalized spatial analogy.

Though not aimed exclusively at providing geographical con-text, prior work has studied the benefits of manually createdmeasurement analogies in which familiar classes of objects(e.g., a shopping mall, an adult male, etc.) are used to providecontext for an unfamiliar measurement (e.g., 1 million ft2,80 kg) [34, 35]. Chevalier et al. provide a design study ofconcrete scales representations–visual re-expressions of com-plex measures–including strategies like reunitization, whichwe use [13]. Their work provides high level design guidancethat influences our approach, including the importance of per-sonal familiarity and an interpretable multiplier. However,our goal is to automate the process of generating measure-ment analogies that apply landmarks to re-express distancesand areas using a database of landmarks that we extract fromlocation-based services. To do so, we identify proxies thatcan be used to predict the personal familiarity of landmarks.

Providing Context for Spatial MeasurementsInformed by how people think about space and landmarks,researchers have developed systems to generate maps thatcontextualize spatial measurements. Traffigram [28] usesisochronal cartography to generate maps that distort pathsbased on anticipated travel time. NewsViews [20] generatesannotated thematic maps to accompany news articles that arerelevant to spatially distributed data, such as a choroplethmap of unemployment rates for an article about unemploy-ment. Atlasify [27] generates maps that relate arbitrary textqueries (e.g., “nuclear power”) to spatial reference systems(e.g., a choropleth map depicting associations between “nu-clear power” and countries in the world). Similar to thesesystems, we develop an automated approach to generate mapsthat relate spatial information to familiar concepts. Howeverour approach addresses a different problem than the prior sys-tems: how to produce spatial analogies of measurements interm of landmarks that are personally familiar to the user.

Figure 2. A locator map.

One simple strategyto help people un-derstand locations isto present a “locatormap”: a map showingthe locations that arementioned in the text(Fig. 2). Studies have

Figure 3. Diagram of our approach. The user’s location, e.g., the centerof Seattle, and a target distance or area are used to generate an analogyin terms of a landmark from a large landmark dataset. The goals inselection are to find a landmark that is familiar to the user and close tothe target measurement in distance or area.

shown that adding locator maps to a text article improves un-derstanding of the article by allowing for simultaneous tex-tual and visual learning [24]. However, while locator mapsare intended to help users better understand where countriesand other locations are located, the goal of our work is to de-sign an approach that can make spatial measurements, suchas distances and areas easier for the user to understand.

Extracting Landmarks From Images and TextResearchers have also developed techniques for extractinglandmarks from large image and architectural data sets basedon their distinguishing visual features, for example by usingcomputer vision and structural information [8, 16, 23, 30, 39].Other approaches mine landmarks from web text using textanalysis techniques to determine landmark significance [43,44]. We similarly leverage online sources to extract land-marks and assess their general familiarity across users. How-ever, our interest is in applying the extracted data to gener-ate personalized spatial analogies, rather than in developing acomprehensive set of landmarks.

APPROACHOur goal is to generate personalized spatial analogies: ex-pressions that contextualize a target spatial measurement (adistance or area) by relating it to spatial properties of oneor more locations that are familiar to a user. Our approachtakes two inputs: the user’s geographical location and a tar-get distance or area (Fig 3). These inputs are used to identifya landmark from a large landmark dataset that achieves twocriteria: the landmark is personally familiar to the user andthe landmark has a distance or area close to the target.

When the target measurement is a distance, our system gen-erates a personalized analogy relating this distance to the dis-tance between the user’s location and a landmark. For exam-ple, an analogy might re-express a target distance of 11 milesas 2 times the distance from the user to a park that is located5.5 miles from the user. When the target measurement is anarea, the personalized analogy relates the target area to thearea of a landmark. For instance, an analogy might re-express200 acres as about half times the area of a 400 acre park nearthe user.

Criteria for Effective Spatial AnalogiesWe consider two criteria for effective spatial analogies thatemerge from psychology research on landmark familiarityand numeracy.

Familiarity of the Analogous Spatial PropertyFor the spatial analogy to make the target measurement easyto understand, the spatial property of the landmark introducedin the analogy (i.e. the distance from user to landmark orthe area of the landmark) must be familiar to the user. Weuse the term personal familiarity to refer to the user’s levelof familiarity with the analogous spatial property of one ormore landmarks. While we cannot easily extract the personalfamiliarity of landmarks to a given user, personal familiar-ity is directly impacted by proximity [14, 21]. We thereforeuse proximity, or distance from the landmark’s point locationto the user’s point location, as a proxy for personal familiar-ity. However, proximity alone does not fully capture a user’spersonal familiarity with a landmark. Some landmarks arerecognizable to many people even if they are far away dueto their cultural or historical significance, for example. Werefer this type of popularity to as general familiarity. Our ap-proach combines information on the user’s proximity to thelandmark with how generally familiar a landmark is acrossusers to better capture personal familiarity.

Closeness of the Analogous Spatial Property to the TargetOur analogies relate a target distance or area to a multiple ofthe analogous spatial property of a landmark: e.g, 200 acres is0.5 times the area of a 400 acre park near the user. We refer tothe multiplicative factor for converting from the target mea-surement to the new measurement of the analogy (e.g. 0.5) asthe multiplier. Research in number sense suggests that peopleare able to accurately reason about numbers between 1 and 3,though numbers between 3 and 10 also result in reasonablyaccurate inferences [7, 15, 18]. As numbers grow greaterthan 10, or dip below 1, the precision and ease with whichpeople can reason about them decreases [7, 15, 18]. Hence,effective spatial analogies should avoid multipliers less than1 and greater than 10.

OBTAINING LANDMARKS AND SPATIAL PROPERTIESOur approach relies on a set of landmarks to generate person-alized spatial analogies (Fig.3). We extract landmarks andtheir spatial properties from online sources as well as a proxyfor general familiarity.

Identifying Landmarks and Their Spatial PropertiesYelp1 is a crowd-sourced website containing reviews of thou-sands of businesses (e.g. restaurants, dry cleaners, etc.) andcommonly visited places (e.g. monuments, parks, libraries).To seed our landmark database, we extract locations of busi-ness and places from Yelp (we limit to U.S. locations inour prototype implementation). However, since some ser-vice businesses are less likely to be associated with a well-known location (e.g., the office location of a tree removal ser-vice may not be known even by clients of the service), we donot include businesses in the Yelp service business categories(Event Planning & Services, Financial Services, Home Ser-vices, Local Services, Professional Services) in our database.For each of the non-service locations we store a name and lo-cation (in latitude/longitude coordinates) as generated by theYelp API [4] in our seed database. With this procedure weobtain 242,194 Yelp locations as our seed set of landmarks.1http://www.yelp.com/

Figure 4. U.S. maps depicting all landmarks (left) and area landmarks(right) in our dataset.

To re-express and visually present area analogies we mustalso collect each landmark’s area and footprint polygon. Wefirst filter the landmarks to a subset of area landmarks—locations whose area is likely to be recognizable to peo-ple [26, 40]. While a person may be familiar with manylocations near her (e.g., a gas station, an office building, adepartment store), it is not necessarily the case that he or shewill be able to easily recognize the area of these places. Forexample, a person may find it difficult to imagine the area of adepartment store where he or she shops because the store hasmultiple floors or includes areas which they cannot access.On the other hand, some areas have clearer boundaries. Weuse a heuristic for determining area landmarks such as parksbased on prior work that differentiates area landmarks fromother landmarks [26, 40]. Specifically, in examining the cate-gories of landmarks represented on Yelp we found that only ahandful of categories describe landmarks whose area is likelyto be recognizable: Parks, Stadiums & Arenas, AmusementParks, Botanical Gardens, or Campgrounds. For area analo-gies we limit our dataset to the locations that fall in thesecategories, yielding 15,950 area landmarks. We also add U.S.states to this set of area landmarks, since the boundaries ofUS stages are likely to be recognizable to many users in theU.S.

For area landmarks, the polygon footprints of the landmarksare required to visualize them in the map. Yelp does notprovide the polygon footprint for locations. Therefore, weextract this information from OpenStreetMap (OSM), a col-laborative, open source mapping resource containing spatialfeatures (nodes, ways, polygons) for locations around theworld [25]. However, Yelp and OSM often use slightly dif-ferent names to refer to the same location (e.g.“SkyCity at theNeedle”, vs. “Sky City Restaurant”). We use a bottom-up ap-proach to ensure that we accurately associate OSM areas andpolygons with Yelp locations.

We first extract all continental U.S. locations from OSM forwhich the area property is not null (293,311 locations) anda footprint polygon is available. We then match the OSMlocations to Yelp location by comparing the names of bothlocations using ngram similarity (a measure of the similaritybetween two strings [2] and the geographical coordinates ofboth locations (using the centroid coordinates for OSM loca-tions). If the names of the two landmarks are close enough(i.e., above 0.5) and the coordinates of the two landmarksare close enough (i.e., the absolute difference is less thanthe average distance between matched landmarks: Latitude =0.000169338147836, Longitude = 0.000172519196556), we

consider the OSM landmark to be the same as the Yelp land-mark and retain the area and polygon. This leaves 5,355 arealandmarks. On a test set of 300 manually evaluated landmarkswe obtain precision of 96.7% and recall of 68.4% using thismatching process. Fig. 4 depicts the spatial coverage of thefull set of landmarks (left) and the area landmarks (right; U.S.state area landmarks not pictured).

Assessing General FamiliarityWe assess the general familiarity of each landmark by rely-ing on how frequently it appears in photos in a large user-generated database. While other sources of familiarity infor-mation are available for Yelp landmarks (such as how manyusers have reviewed the landmark on Yelp), certain types oflandmarks (e.g., restaurants, nightclubs) consistently receivemore reviews than other common landmarks (e.g., monu-ments, bridges) [32]. We found that the number of photostaken of a landmark across a large set of users was less biasedtoward certain types of locations.

Specifically, for each of the landmark in our set of 242,194,we use the Flickr API 2 to find the total number of Flickr pho-tos from Jan. 2010 to Jan. 2015 for which the landmark nameappears in the photo tag and the photo was taken within 1mile of the landmark’s geographic coordinates. According tothis measure of general familiarity, the most widely popularlandmarks are Walt Disney World, Central Park, Golden GatePark, Epcot Center, and the Washington Monument. Approx-imately 70% of the landmarks 169,128 do not appear in anyFlickr photos, such as many small retailers, government of-fices, and motels. We do not omit these landmarks from thedatabase, however, as there may be cases where they are verynear to the user’s location (i.e., high personal familiarity) andalso close in area or distance to the target measurement. Theremaining set of 73,117 landmarks has a mean photo countof 572.1 and median of 10. For area landmarks that are U.S.states, we record the photo count as the sum of the photocounts we obtain for all landmarks of the 242,194 in that state(mean: 408,552.5, median:227,952).

Our landmark dataset is available in a public repository 3.

GENERATING ANALOGIES USING ENERGY FUNCTIONSOur approach takes a distance or area and the geographicalcoordinates of the user’s location as input and uses an energyfunction minimization approach. The function considers alinear combination of terms that together capture the crite-ria for effective spatial analogies: closeness of the analogousproperty to the target measurement (multiplier), personal fa-miliarity (proximity), and general familiarity.

Specifically, we define the energy function E(`) of a landmark` given user u and target distance t as:

E(`) = Wp f Ep f (u, `) +Wg f Eg f (`) +WmultEmult(t, u, `) (1)

where Ep f (u, l) is a function of the personal familiarity of lto u, Eg f (`) is a function of the general familiarity of `, andEmult(t, u, `) is a function of the multiplier required for thedistance between u and ` to be equivalent to the t.2https://www.flickr.com/services/api/3https://bitbucket.org/yeaseulK/chi16-geospatial-analogy

We define the personal familiarity of a landmark to given auser Ep f (u, `) as:

Ep f (u, `) = distancev(u, `)

where distancev(u, `) is the Vincenty distance between ` andu. Vincenty distance is a slightly more accurate measure ofdistance than Euclidean distance as it accounts for the shapeof the earth [46]. Our use of proxmity is based on the obser-vation that landmarks that are closer to a user tend to be morepersonally familiar [14, 21]. For area landmarks includingstates, we use the centroid of the landmark’s polygon to cal-culate proximity of a user.

We define the general familiarity of a landmark Eg f (`) as:

Eg f (`) =1

cFlickr(`) + 1(2)

where c f lickr(`) is the Flickr photo count for landmark `. Sincehigh photo count landmarks are likely to be more generallyfamiliar and our goal is to minimize the energy, we take theinverse of the photo count, adding one to the denominator toavoid division by 0.

Emult(t, u, `) considers the multiplicative factor mult(t, u, `) =tp required to convert the analogous spatial property p of `(the distance of u to ` for distance analogies or the area of` for area analogies) to the target measurement t. We defineEmult(t, u, `) as:

Emult(t, u, `) =

1/mult(t, u, `) if 0 ≤ mult(t, u, `) < 1

0 if 1 ≤ mult(t, u, `) < 30.1 if 3 ≤ mult(t, u, `) < 10

mult(t,u,`)−102 + 1 if 10 ≤ mult(t, u, `)

(3)Emult penalizes a landmark more heavily once the multiplierfalls below 1 or if it exceeds 10, as the numbers 1 through10 tend to be easier for people to reason with than numbersoutside of this range [7, 15, 18]. We define the particularfunctions such that a multiplier just below 1 and just above10 have the same penalty. When the multiplier of ` is be-tween 1 and 3 (the most familiar numbers based on psychol-ogy studies of the reaction time required to recognizing dif-ferent quantities [7, 15, 18]), we set Emult to 0, and we assigna slight penalty when the multiplier is between 3 and 10 [7,15, 18] (Fig. 5).

We use a weighted global criterion method [47] for multi-criteria optimization to find the values for the weights.Specifically, we repeatedly varied the inputs and weightswhile maintaining an a priori relative order of weights thatwe motivate based on prior geographical and numerical cog-nition research. Specifically, we weight personal as proximitythe most, as a landmark that the user is not personally familiarwith is unlikely to provide a useful distance or area reference.We weight the multiplier only slightly less than personal fa-miliarity as proximity, as extremely large or small multipliersare also likely to make it much harder for a user to understandthe analogy. We weight general familiarity the least since itdoes not necessarily predict whether the user is familiar withthe spatial properties of the landmark.

Figure 5. Plot showing penalty assigned by Emult(t, u, `) function by givenmagnitude of the multiplier.

Finally, we assign the following specific weights to each term:

E(`) = 5Ep f (u, `) + Eg f (`) + 4Emult(u, `) (4)

Depending on whether the target measurement is a distance oran area we apply the energy function to the set of all 242,194Yelp landmarks or only the 5,355 area landmarks. We thenselect the lowest energy landmark for use in the spatial anal-ogy.

Precision and RoundingRetaining a high degree of precision (i.e., many decimalplaces) in reporting numeric values can make numbers morecomplex for users to interpret [31]. Rounding is a strat-egy that people use naturally to reduce complexity [11, 41].While rounding can entail a loss of precision, hedges like“about” or “approximately” are often used to communicatethat a value has been rounded [5].

To make multipliers easier to interpret while maintaining anupper bound on the loss of precision, we round multipliers tothe nearest integer but adaptively retain extra decimal placesas necessary until the rounded version is less than 5% differ-ent from the target measurement.

ResultsFigures 6 and 7 show the top three personalized spatial analo-gies produced by our approach for various user point loca-tions and a set of distance and area targets respectively. Manyof the results include landmarks that are close to the userwhile still maintaining reasonable multipliers. As the in-put gets larger, multipliers grow larger than 10 to allow forlandmarks that are more personally familiar to the user (i.e.,higher proximity).

To assess the impact of the energy terms in our function in-dependently, we generated similar tables for a set of distanceand area inputs and user point locations but where one termin the energy function is turned off at a time. These resultsare available in our public repository 4.

APPLICATIONWe develop an interactive personalized spatial analogy appli-cation that takes a geographical location inputted by the userand a distance or area and generates a personalized spatialanalogy.4https://bitbucket.org/yeaseulK/chi16-geospatial-analogy

Figure 6. The top three personalized distance analogies generated by our approach for a range of targets and user locations.

Figure 7. The top three personalized area analogies generated by our approach for a range of targets and user locations.

User ExperienceUpon first using our application, a user is prompted to providethe address or spatial coordinates of a place that is highly fa-miliar to them, such as their home address or a longtime pre-vious residence. In browsing the web, when the user opensa text article, our application highlights distances and areasin the text, as well as all references to locations (includingneighborhoods, cities, states, countries, and other locations,e.g., the Pentagon, University of Arizona, etc.). If the articlereferences one or more locations, a locator map depicting thelocations is presented to the right of the article (Fig. 8).

When the user clicks on any distance or area, a second mappresenting a personalized spatial analogy for that measure-ment appears below the locator map (Fig. 9 lower left). Whenthe user clicks on any single location in the text, the locatormap is scaled to focus on the selected location, and a per-sonalized area analogy map appears below the locator map(Fig. 9 right). The analogy map presents the selected loca-tion’s area (e.g., a country) as a transparent overlay on thearea landmark returned by our energy function approach.

If multiple locations (cities, neighborhoods, landmarks) arementioned in the text, the user can select any pair of locations

by clicking on them while pressing the shift key. The locatormap is zoomed to show the two selected locations, and a per-sonalized distance analogy map is shown below the locatormap (Fig. 9, 8 center map).

System ImplementationOur application generates personalized spatial analogies inthree steps (Fig. 10). First, a tagger identifies tags distancesand areas that are referenced in the text article as well as thepoint location of the address provided by the user using theGoogle Maps Geocoding API [22]. When one of the taggedmeasurements is selected from an article the application gen-erates the personalized spatial analogy for the target measure-ment using our energy function and landmark database. Thetop ranking spatial analogies returned by the energy functionare then visualized in interactive maps.

Step 1: Tag Spatial Measurements and LocationsIn step 1 we tag all distances and areas as well as locationsthat appear the article text. To identify distances and areas wematch against a small set of regular expressions of the form:

[number][optional separator][unit]where [number] matches any string of numeric digits with orwithout thousands separators, [optional seperator] matches

Figure 8. A user located in Las Vegas, NV (36.170864,-115.127571) in-teracts with our system as she reads a text article. By default, a locatormap depicts locations referenced in the text. The user selects severalcities in the article to see the distance between these locations contextu-alized through a personalized spatial analogy.

whitespace or dash (e.g., “-”) characters, and [unit] matchesany unit expression for distance or area (specifically, wematch various spellings of meters, kilometers, feet, yards,and miles for distance and various spellings of the same unitssquared plus acres for area).

To identify locations in the text we apply named entity recog-nition (NER) tools on the article text. Specifically, we followthe approach of Gao et al. [20]. We first use Wikifier [12]to tag locations. Wikifier identifies named entities in textfor which a Wikipedia article exists. For all tagged enti-ties for which an article is found, we attempt to extract thegeocoordinates and area from the article using the WikipediaAPI [48]. We then use OpenCalais [3], a general-purposeNER tool, to remove any of the Wikifier locations that Open-Calais identifies as a person rather than a location. Thoseentities that remain are a good proxy for locations in the ar-ticle. Gao et al [20] use a gold standard set of human-taggedlocations in 47 articles to show that this location tagging ap-proach achieves 92.7% precision and 42.6% recall.

This process results in a set of mentioned locations fromthe text along with their point locations and areas fromWikipedia. For all locations for which we successfully ob-tain an area, we attempt to obtain the geographical polygonfor that location from OpenStreetMap.

The set of areas corresponding to identified locations and theset of distances between all pairs of identified locations be-come inputs for our energy functions.

Step 2: Generating Personalized Spatial AnalogiesFor each distance and area that results from step 1, we ap-ply our energy function approach. The output of applyingour energy function to a target distance or area includes thename, spatial coordinates, and multiplier for the top rankedlandmark, as well as its footprint polygon for target measure-ments that are areas.

Step 3: Presenting Personalized AnalogyA renderer presents the results of step 2. The renderer high-lights the tagged distances, areas, and locations the text. Ifthe article contains one or more location references, a loca-tor map depicting the locations is presented to the right of thetext. When a user clicks on a highlighted distance, area, or lo-cation, they are presented with a map depicting the top rankedpersonalized spatial analogy for that measurement below thelocator map (Fig. 8, 9).

Visualization: We implement all personalized distanceanalogy maps and the personalized area analogy mapsfor explicitly-referenced areas (e.g., 13 acres) using theJavascript-based library Leaflet [1]. Maps use the Web Mer-cator projection. The map extent is set using the default mapextent approach provided in Leaflet, which centers the mapon the centroid of all locations that are presented. All mapsallow the user to zoom in and out using a +/- control in thetop left of the map.

We construct the personalized area analogy maps for loca-tions that are mentioned in the text using D3 [6]. D3 lets ususe a different projection—the Lambert Azimuthal equal-areaprojection—that minimizes the distortion of areas betweendifferent locations on the globe. This projection ensures thatthe user can accurately compare the area of the selected loca-tion to that of the landmark returned as the top result by theenergy function.

EVALUATION: USER STUDYTo evaluate our tools, we conduct a controlled user study inwhich participants are presented with text articles containingmeasurements. We hypothesize that users will find articlescontaining personalized spatial analogies generated by oursystem more helpful for understanding measurements thanthe most common baseline case in which an article does notcontain analogies (H1). We also speculate that users will findarticles containing personalized spatial analogies from oursystem more helpful than articles containing generic spatialanalogies (H2). To generate generic spatial analogies, we ap-plied our energy functions using a fixed location in New YorkCity–the location of the Empire State Building–rather thanthe user’s location. Generic spatial analogies are equivalent

Figure 9. An article containing spatial references for a user located in Seattle, WA (47.606831, -122.332427). When a user clicks on a distance or shiftclicks on two locations (e.g., cities) in the text, a personalized distance analogy map is presented (lower left, center). When a user clicks on an area orsingle location, a personalized area analogy map is presented (top left, right).

to the analogies that our approach would generate for a userwho is located at the Empire State Building. This form ofanalogy is similar to the type of general purpose analogy thata journalist at the New York Times might construct.

Study DesignWe conducted our study using a between-subjects repeatedmeasures design. We collected 10 text articles containing dis-tances and areas and selected a single measurement in eacharticle. For each article and measurement, we created oneversion of the article that includes a personalized spatial anal-ogy map, and one “baseline” version of the article that in-cludes no map. This comparison allowed us to assess H1, thatusers would find personalized spatial analogies more helpfulfor understanding measurements relative to a typical presen-tation of a text article. To assess H2 that users would findpersonalized spatial analogies more helpful for understand-ing measurements relative to a generic spatial analogy, wealso create one version of each article that presents a “genericspatial analogy” which we generate by applying our energyfunction approach but with the fixed New York City locationof the Empire State Building.

Study Procedure and ParticipantsFor each article (trial), we highlight the selected distance orarea. The participant is instructed to read the article. Theparticipant then answers several questions. First, if the trialincludes an analogy map, the participant is asked to:

• (Landmark familiarity): Rate how familiar you are with thelandmark you see on the analogy map. (7 pt Likert)

This question serves as a manipulation check to confirm thatthe landmarks that appear in personalized spatial analogiesare in fact more familiar to the user than those in genericanalogies.

We also ask the participant to answer two additional questions(all trial):

• (Helpfulness for understanding measurement): How help-ful is the content of the article (including text, images, andgraphics) for helping you understand the size of the high-lighted measurement? (7 pt Likert)

• (Factors affecting helpfulness): Briefly describe how thearticle content is or is not helpful for understanding the sizeof the highlighted measuremet. (1-2 sentence max).

Finally, for each trial we also require the participant to reportthe highlighted measurement using a text box. We use incor-rect answers to this question to filter responses from partici-pants who may not have paid attention.

We posted the study as a single HIT on Amazon’s Mechani-cal Turk available to U.S. workers with an approval rating of95% or higher. We ensured that the 10 trials assigned to eachparticipant included at least one baseline version, one genericspatial analogy, and one personalized spatial analogy. As-signment of article versions was otherwise randomized andcounter-balanced across participants. The HIT carried a re-ward of $3.00 and participants could earn an additional bonusof $1.00 if their responses to the attention verification ques-tions are correct and their responses to the free text questionwere thoughtful.

Results40 workers completed the task (400 trials). We removed re-sponses for 1 trial in which the worker incorrectly identifiedthe highlighted element. The mean time to complete the HITwas 26.3 minutes.

We first analyzed the results of our landmark familiarity ma-nipulation check question to confirm that personalized spa-tial analogies did in fact use more familiar landmarks than

generic spatial analogies. The mean perceived familiarityof the landmark for personalized spatial analogies was 1.8points higher on a 7 point scale (µ=5.39, σ=2.53) than thatfor generic analogies (µ=3.62, σ=2.19; t(232.92)=-6.081,p <0.0001). (Baseline version users did not view a analogymap with a landmark so were not asked for this rating).

We next assessed H1: that users find personalized spa-tial analogies more helpful for understanding the measure-ments compared to the typical presentation of the base-line condition. We used an ANOVA (F(2,347)=30.43) fol-lowed by a TukeyHSD test to compare personalized spa-tial analogies to the baseline and to generic spatial analo-gies. The mean rated helpfulness of the content for under-standing the size of the measurement was 1.9 points higher(µ=4.31,σ=2.14) for personalized spatial analogies than forthe baseline (µ=2.41,σ=1.59; pad j <0.001).

We then assessed H2: that users find spatial analogies morehelpful for understanding the measurements compared togeneric spatial analogies. The average helpfulness for the per-sonalized spatial analogies condition was 0.69 points higherthan the generic spatial analogies condition (µ=3.63,σ=1.98;pad j <0.01). The TukeyHSD test also indicated that genericspatial analogies received higher ratings relative to the base-line (pad j <0.001).

In describing factors that affected whether the content wasor was not helpful, users of personalized spatial analogiesfrequently described personal experiences with the landmarkthey saw. For example, one participant mentioned that ‘Ithelps to understand the distance because I have walked thetrails around this landmark many times”. On the other hand,multiple responses of users of generic spatial analogies ex-pressed frustration toward the analogy when the landmarkswere not familiar to the respondents: “I’m from upstate NewYork and don’t have much familiarity with the NYC area.Therefore it was difficult to contextualize the distance be-tween the two places.” Other responses from users of genericanalogies suggest that general familiarity can provide somehelp in allowing the user a rough guideline for judging themeasurement size. For example, one user explained that “Al-though I’ve never been to Central Park. I am a bit familiarwith it from TV shows and movies. So the measurements givenI can generally think of and what exactly they mean.”

DISCUSSION AND FUTURE WORK

LimitationsWhile self-reported helpfulness provides a useful signal ofwhether users find our tools helpful for understanding, furthercomprehension studies that elicit and test users understandingrelative to their existing mental models is important for futurework.

Future WorkThe findings from our user study suggest that proximity ofa user to a landmark effectively captures personal familiar-ity. This is a simple proxy, but our work showed how far thissimple proxy could go in supporting the generation of use-ful personalized analogies. However, many other factors will

Figure 10. Our personalized spatial analogy reader application tagsmeasurements and locations in the article. We use an energy functionoptimization approach to generate personalized spatial analogies, whichare presented using interactive map visualizations. The example mapdepicts a personalized analogy for the distance 100 miles for a user atM.I.T. in Cambridge, MA (42.360051,-71.094214)

also contribute to personal familiarity, such as what types oflocations a user spends time in or where people who are closeconnections to the user are located. Incorporating informa-tion from social media, navigation behavior will be anotherway to consider the other aspect of personal familiarity. Fu-ture work in automated personalized spatial analogies shouldexplore additional factors beyond proximity and general fa-miliarity that can be used to predict personal familiarity.

Other areas for future work include expressing analogies fordistance using travel times in lieu of actual distances. Forsome people, travel time may be an easier proxy for under-standing distance than a conventional distance measure [9,28]. Our approach can be extended to present travel timeinformation along with or instead of conventional distancemeasures.

While we believe that improving understanding of spatialmeasurements in a news reading context is a powerful ap-plication or our tools, our approach may also prove beneficialin other settings. For example, our personalized analogiesmay be useful in a digital learning setting for helping studentsdevelop a better understanding of geographical information.Other promising future applications include navigational ap-plications, in which our tools might be used to deliver person-alized distance or travel time information to help a user planor navigate during a trip.

Familiarity and Multiplier Trade-offOur study results show that the participants were more likelyto find the analogies helpful when they accounted for proxim-ity between the user and the landmark. When a landmark wasnot well known, as in generic spatial analogies, ratings werelower and participants often commented on their inability torelate to the size of the measurement. However, further studyis needed to explore the trade-off between personal familiar-ity and the multiplier.

CONCLUSIONWe contribute tools for generating personalized spatial analo-gies that utilize a user’s location and a large landmark datasetto contextualize spatial measurements. We identify criteriafor effective personalized spatial analogies and develop an en-ergy function minimization approach for applying these crite-ria to select landmarks for which the analogous spatial prop-

erty is close to the target measurement but which are morefamiliar to the user. We present an interactive applicationtool that tags distances, areas, and locations in a text articleand presents personalized spatial analogies using interactivemaps. We find that users who view a personalized spatialanalogy map generated by our system rate the helpfulness ofthe article content higher than users who saw only the textarticle or a generic spatial analogy map.

ACKNOWLEDGEMENTSThis work was partially supported by a Tableau Fellowship.

REFERENCES1. 2015. Leaflet JS. (2015). http://leafletjs.com/

2. 2015. NLTK Corpus Reader Documentation. (2015).http://www.nltk.org/_modules/nltk/corpus/reader/lin.html

3. 2015. Open Calais. (2015).http://new.opencalais.com/

4. 2015. Yelp API. (2015).

5. Susana Bautista, Raquel Hervs, Pablo Gervs, RichardPower, and Ra Williams. 2011. How to Make NumericalInformation Accessible: Experimental Identification ofSimplification Strategies. (2011).

6. Michael Bostock, Vadim Ogievetsky, and Jeffrey Heer.2011. D3 Data-Driven Documents. IEEE Transactionson Visualization and Computer Graphics 17, 12 (Dec.2011), 2301–2309. DOI:http://dx.doi.org/10.1109/TVCG.2011.185

7. Elizabeth M Brannon. 2006. The representation ofnumerical magnitude. Current opinion in neurobiology16, 2 (2006), 222–229.

8. C Brenner and B Elias. 2003. Extracting landmarks forcar navigation systems using existing GIS databases andlaser scanning. International archives ofphotogrammetry remote sensing and spatial informationsciences 34, 3/W8 (2003), 131–138.

9. Pat Burnett. 1978. Time cognition and urban travelbehavior. Geografiska Annaler. Series B. HumanGeography (1978), 107–115.

10. David Caduff and Sabine Timpf. 2008. On theassessment of landmark salience for human navigation.Cognitive processing 9, 4 (2008), 249–267.

11. Jamie I. D. Campbell. 2005. Handbook of MathematicalCognition. Psychology Press.

12. X. Cheng and D. Roth. 2013. Relational Inference forWikification. In EMNLP.

13. Fanny Chevalier, Romain Vuillemot, and Guia Gali.2013. Using Concrete Scales: A Practical Frameworkfor Effective Visual Depiction of Complex Measures.IEEE TVCG 19, 12 (2013), 2426–2435.

14. Helen Couclelis, Reginald G Golledge, Nathan Gale,and Waldo Tobler. 1987. Exploring the anchor-pointhypothesis of spatial cognition. Journal ofEnvironmental Psychology 7, 2 (1987), 99–122.

15. Stanislas Dehaene, Elizabeth Spelke, and LisaFeigenson. 2004. Core Systems of Number. Trends inCognitive Sciences 8, 7 (2004), 307–314.

16. Birgit Elias. 2003. Extracting landmarks with datamining methods. In COSIT. Springer, 375–389.

17. John Eliot. 2002. About spatial intelligence: I.Perceptual and Motor Skills 94, 2 (April 2002),479–486. DOI:http://dx.doi.org/10.2466/pms.2002.94.2.479

18. Lisa Feigenson, Stanislas Dehaene, and ElizabethSpelke. 2004. Core systems of number. Trends incognitive sciences 8, 7 (2004), 307–314.

19. Nathan Gale, Reginald G Golledge, William C Halperin,and Helen Couclelis. 1990. EXPLORING SPATIALFAMILIARITY. The Professional Geographer 42, 3(1990), 299–313.

20. Tong Gao, Jessica R Hullman, Eytan Adar, Brent Hecht,and Nicholas Diakopoulos. 2014. NewsViews: anautomated pipeline for creating customgeovisualizations for news. In Proceedings of the 32ndannual ACM conference on Human factors in computingsystems. ACM, 3005–3014.

21. Reginald G. Golledge and Robert J. Stimson. 1996.Spatial Behavior: A Geographic Perspective. TheGuilford Press, New York.

22. Google. 2015. Google Maps Javascript API. (2015).https://developers.google.com/maps/documentation/javascript/examples/geocoding-simple

23. Floraine Grabler, Maneesh Agrawala, Robert WSumner, and Mark Pauly. 2008. Automatic generation oftourist maps. Vol. 27. ACM.

24. Jeffrey L. Griffin and Robert L. Stevenson. 1994. TheEffectiveness of Locator Maps in Increasing ReaderUnderstanding of the Geography of Foreign News.Journalism & Mass Communication Quarterly 71, 4(Dec. 1994), 937–946. DOI:http://dx.doi.org/10.1177/107769909407100417

25. Mordechai (Muki) Haklay and Patrick Weber. 2008.OpenStreetMap: User-Generated Street Maps. IEEEPervasive Computing 7, 4 (Oct. 2008), 12–18. DOI:http://dx.doi.org/10.1109/MPRV.2008.80

26. Stefan Hansen, Kai-Florian Richter, and AlexanderKlippel. 2006. Landmarks in OpenLSa data structure forcognitive ergonomic route directions. In Geographicinformation science. Springer, 128–144.

27. Brent Hecht, Samuel H Carton, Mahmood Quaderi,Johannes Schoning, Martin Raubal, Darren Gergle, andDoug Downey. 2012. Explanatory semantic relatedness

http://leafletjs.com/

http://www.nltk.org/_modules/nltk/corpus/reader/lin.html

http://www.nltk.org/_modules/nltk/corpus/reader/lin.html

http://new.opencalais.com/

http://dx.doi.org/10.1109/TVCG.2011.185

http://dx.doi.org/10.2466/pms.2002.94.2.479

https://developers.google.com/maps/documentation/javascript/examples/geocoding-simple



http://dx.doi.org/10.1177/107769909407100417

http://dx.doi.org/10.1109/MPRV.2008.80

and explicit spatialization for exploratory search. InProceedings of the 35th international ACM SIGIRconference on Research and development in informationretrieval. ACM, 415–424.

28. Sungsoo Ray Hong, Yea-Seul Kim, Jong-Chul Yoon,and Cecilia R Aragon. 2014. Traffigram: distortion forclarification via isochronal cartography. In Proceedingsof the SIGCHI Conference on Human Factors inComputing Systems. ACM, 907–916.

29. Alexander Klippel and Stephan Winter. 2005. Structuralsalience of landmarks for route directions. In Spatialinformation theory. Springer, 347–362.

30. Kengo Koiso, Takehisa Mori, Hiroaki Kawagishi,Katsumi Tanaka, and Takahiro Matsumoto. 2000.Infolod and landmark: Spatial presentation of attributeinformation and computing representative objects forspatial data. International Journal of CooperativeInformation Systems 9, 01n02 (2000), 53–75.

31. Manfred Krifka. 2002. For verbosity and precision1.Sounds and systems: studies in structure and change. Afestschrift for Theo Vennemann 141 (2002), 439.

32. Michael Luca and Georgios Zervas. 2013. Fake it tillyou make it: Reputation, competition, and Yelp reviewfraud. Harvard Business School NOM Unit WorkingPaper 14-006 (2013).

33. Kevin Lynch. 1960. The image of the city. Vol. 11. MITpress.

34. Alejandra Magana, Sean Brophy, and Lynn Bryan. 2012.An Integrated Knowledge Framework to Characterizeand Scaffold Size and Scale Cognition (FS2C). Int’l J. ofScience Educ. 34, 14 (2012), 2181–2203.

35. Alejandra J Magana. 2014. Learning strategies andmultimedia techniques for scaffolding size and scalecognition. Computers & Education 72 (2014), 367–377.

36. Daniel R. Montello. 1997. The Perception andCognition of Environmental Distance. In Direct Sourcesof Information,Proc. Int. Conf. on Spatial InformationTheory: A Theoretical Basis for GIS. 297–311.

37. John Allen Paulos. 1988. Innumeracy: MathematicalIlliteracy and Its Consequences. Macmillan.

38. Martin Raubal and Stephan Winter. 2002a. Enrichingwayfinding instructions with local landmarks. Springer.

39. Martin Raubal and Stephan Winter. 2002b. Enrichingwayfinding instructions with local landmarks. Springer.

40. Kai-Florian Richter and Alexander Klippel. 2005. Amodel for context-specific route directions. In Spatialcognition IV. Reasoning, action, interaction. Springer,58–78.

41. Bruce M. Ross and Trygg Engen. 1959. Effects of roundnumber preferences in a guessing task. Journal ofExperimental Psychology 58, 6 (1959), 462–468. DOI:http://dx.doi.org/10.1037/h0049112

42. Charles Seife. 2010. Proofiness: The Dark Arts ofMathematical Deception (1 ed.). Viking Adult.

43. Taro Tezuka and Katsumi Tanaka. 2005. Landmarkextraction: A web mining approach. In Spatialinformation theory. Springer, 379–396.

44. Taro Tezuka, Yusuke Yokota, Mizuho Iwaihara, andKatsumi Tanaka. 2004. Extraction ofcognitively-significant place names and regions fromweb-based physical proximity co-occurrences. In WebInformation Systems–WISE 2004. Springer, 113–124.

45. Barbara Tversky. 2003. Structures of mental spaces howpeople think about space. Environment and behavior 35,1 (2003), 66–80.

46. Thaddeus Vincenty. 1975. Direct and inverse solutionsof geodesics on the ellipsoid with application of nestedequations. Survey review 23, 176 (1975), 88–93.

47. Chankong Vira and Yacov Y Haimes. 1983.Multiobjective decision making: theory andmethodology. Number 8. North-Holland.

48. Wikipedia. 2004. API: Main page. (2004).https://www.mediawiki.org/wiki/API:Main_page

http://dx.doi.org/10.1037/h0049112

https://www.mediawiki.org/wiki/API:Main_page

Date post:	29-Sep-2020
Category:	Documents
Upload:	others
View:	3 times
Download:	0 times