IBM - Dublin Research Lab
Searching in the City of
KnowledgeSIGIR Tutorial
Veli Bicer, Vanessa Lopez
IBM Research
IBM - Dublin Research Lab
About the Tutorial
IBM - Dublin Research Lab
Scope of the Tutorial
IBM - Dublin Research Lab
Part I
Beyond Local Search
Veli Bicer
IBM Research
IBM - Dublin Research Lab
Outline
• A Planet of Smarter Cities
• City Data and Information
• Making City Search Smarter
IBM - Dublin Research Lab
A Planet of Smarter Cities
“Cities have the capability of providing something for everybody, only because, and only when, they
are created by everybody.”Jane Jacobs
IBM - Dublin Research Lab
A planet of smarter cities: In 2007, for the first time in history, the
majority of the world’s population—3.3 billion people—lived in
cities. By 2050, city dwellers are expected to make up 70% of
Earth’s total population, or 6.4 billion people.
IBM - Dublin Research Lab
IBM - Dublin Research Lab
China
WatsonAlmaden
Austin
TokyoHaifa
Zurich
India
Dublin
Melbourne
Brazil
IBM Research Worldwide
IBM - Dublin Research Lab
Instru
men
ted
Inte
rcon
ne
cte
dIn
tellig
en
t
Dublin
Test B
ed
Energy Movement Water
Seed Projects
Real World Insight | Data Sets | Devices
Optimization
Predictive Modelling
Forecasting
Simulation
Solu
tions th
at S
usta
in E
conom
ic D
evelo
pm
ent
Driving New Economic Models
Significant Collaborative R&D
Skills Development & Growth
Competitive Advantage
Collaboration and Access to Local, Regional & Worldwide Network
SME’s | MNC’s | Universities | Public Sector | VC Community
Intelligent Urban and Environmental Analytics and Systems
Sm
art C
ity S
olu
tions
Integrated Cross Domain Solutions
City Fabric
Smarter Cities Technology Centre
IBM - Dublin Research Lab
A “mission control” for infrastructureA “mission control” for infrastructure A showcase for urban planning conceptsA showcase for urban planning concepts
A totally “wired” cityA totally “wired” city A self-sufficient, sustainable eco-cityA self-sufficient, sustainable eco-city
Many Visions of what a Smarter City might be
IBM - Dublin Research Lab
City Data and Information
“The country places and the trees don’t teach me anything, but the people in the
city do”
Socrates
IBM - Dublin Research Lab
IBM - Dublin Research Lab
Transportation Social MediaEnergy Management City Management
RegionSupply Chain Food System HealthCare
• Large, open and continuous data environment from heterogeneous domains:
and even more�
City of Data and Information: Many Areas
Water
Management
IBM - Dublin Research Lab
City Data Trends
2009,
Data.gov.uk
Data.gov (US)
1993, SEC
Online
2004, USG
announces e-
Gov 2.0
Content
Factual &
Static
>350 ‘Open
City Data
Catalogs’
(data.gov)
>350 ‘Open
City Data
Catalogs’
(data.gov)
2011+, Gov 3.0
City as an Enterprise....
Activity
Time2010,
Amazon,
Google & MSoft
Content
Structure
Innovation
Aggregation
& Efforts to
create linkage
based on
Semantic Web
>25 Billion
Triples on
Linked Data
Cloud
>25 Billion
Triples on
Linked Data
Cloud
Innovation
based on
Collaboration
& Social
Innovation
35 Cities in
Open Data
Hackday,
12/2010
35 Cities in
Open Data
Hackday,
12/2010
Ecosystem
increasingly
focused on
long-term
sustainability
Publicdata.eu –
LOD2 for
Citizen study
due 2014
Publicdata.eu –
LOD2 for
Citizen study
due 2014
IBM - Dublin Research Lab
Some Traffic-related Data Sets from Dublin
� Big data
� Heterogeneous data
� Static, Continuous data
� Not all open yet,
� Not linked yet
� Noisy data (inconsistent, imprecise)
IBM - Dublin Research Lab
POWERED by
Open Innovation Portalwww.dublinked.ie
IBM - Dublin Research Lab
Dublinked - outcomes
• Publish and put into context (100’s datasets, 1000’s of files)
• Create innovation ecosystem
Waste Collection
Property management
Environment
Demographics
Business & Retail
Commercial valuations
and rates
Tourism
Transport & Access
Crime
Heritage
Mapping
Housing
WaterFault Reporting
Events
Health
Planning
Pool resources
Share results
IBM - Dublin Research Lab
More on city data will be covered in
Part II
IBM - Dublin Research Lab
Making City Search Smarter
“We cannot afford merely to sit down and deplore the evils of city life as inevitable, when cities are constantly growing,
both absolutely and relatively. We must set ourselves vigorously about the task of improving them; and this task
is now well begun.”
Theodore Roosevelt
IBM - Dublin Research Lab
City SearchNot a revolution, but evolution
City SearchConcerns all type of (complex) queries encountered in everyday city life, e.g. city events.
City SearchRelevance is highly dimensional, context-dependent and leverage more city-specific information than Web information.
Genetic D
rift
Web SearchOrdinary Web users want to locate information (e.g. documents) on the Web
Genes
Information Need Search Relevance Information Source
Local SearchMainly targeting queries to locate businesses within a geographic area
Local SearchExtending Web-based relevance with spatiotemporal relevance.
City SearchCity data provides a unique source of information to understand city context
Web SearchModels relevance using IR models based on content, Web popularity, clickthrough data etc.
Local SearchEnhanced with location and time information collected from mobile sensors, IP address etc.
Web SearchUtilizing Web-based information such as document content, Web graph, search engine logs, etc.
IBM - Dublin Research Lab
What do people search for?
A recent user study about the public displays in the urban areas to understand the
information needs of citizens [Kukka, PUC, 2013]
Conducted in Oulu, Finland
Diversity of the information
Need for city awareness
Differences to Web
queries [Spink et al, 2001]
IBM - Dublin Research Lab
What do people search for?Top 8 categories according to user scores [Kukka, PUC, 2013]
IBM - Dublin Research Lab
Local search is on the rise
Percentage of local search traffic in major search engines
According to [Zhang et al, GIR’06], 83.77% Yahoo! geo-queries has a city name
Source: Chitika, 2012
IBM - Dublin Research Lab
Local Queries: Geotagging
People tend to use geotags
Source: Search Engine Land, 2011
IBM - Dublin Research Lab
Search Topics
Not every city is the same
Source: Chitika, 2013
IBM - Dublin Research Lab
Time matters
Source: Chitika, 2013, Lane et al, Ubicomp’10
IBM - Dublin Research Lab
Distance
• Users’ “distance sensitivity” is relative to the type of business
considered
Source: Berberich, SIGIR’11
IBM - Dublin Research Lab
Relevance
• Still not awake? Coffee? ☺
• Costa? Starbucks?
– Does the distance really matter?
IBM - Dublin Research Lab
Relevance
• Local vs. Web popularity
• What else?
• More information�
More relevance to the
user!!
Source: Foursquare,
July 2013
IBM - Dublin Research Lab
Relevance
• More Semantics
– “skate ramp park Dundrum Town Center”
• The information is not contained in one data
source
• Need information about parks, POI, location etc.dublinked dbpedia
park
yeshasSkate
dtc
dundrum
located
located
IBM - Dublin Research Lab
Relevance
• Need to buy new “furniture”?
IBM - Dublin Research Lab
Relevance
• Dublin TRIPS data:
IBM - Dublin Research Lab
Relevance
• Dublin Trips Data:
– Journey times throughout the city
– Real-time data with updates in every minute
– Historical data is available for every day since 9/7/2012
– Mined from SCATS-based (Sydney Coordinated Adaptive Traffic
System) intelligent transportation system for 500+ sites around Dublin
• Accessible from:
– http://dublinked.ie/datastore/datasets/dataset-215.php
• Visualization
– http://www.dublinked.ie/traffic/
IBM - Dublin Research Lab
Relevance
• More transportation data
– Public Transport Route Networks
• http://dublinked.ie/datastore/datasets/dataset-258.php
– Dublin Bus GPS Data• http://dublinked.com/datastore/datasets/dataset-304.php
– Dublin Bus GTFS data • http://dublinked.ie/datastore/datasets/dataset-254.php
– Accessible Parking Places • http://dublinked.com/datastore/datasets/dataset-049.php
– Roads and Streets in Dublin City • http://dublinked.com/datastore/datasets/dataset-123.php
IBM - Dublin Research Lab
RelevanceBuying your dream house
Finding the houses?
Is the price reasonable?How is the neighborhood?
Perfect match!!
IBM - Dublin Research Lab
Relevance
• Property Register Index : ~52000 property sales
Available at http://kdeg.cs.tcd.ie/propertyPriceMap/
IBM - Dublin Research Lab
Relevance
• More city data:
– Amenities & Recreation
• http://dublinked.ie/datastore/by-category/amenities-
recreation.php
– Schools
• http://dublinked.com/datastore/datasets/dataset-099.php
– Key developing areas
• http://dublinked.ie/datastore/datasets/dataset-134.php
– Air pollution monitoring data
• http://dublinked.ie/datastore/datasets/dataset-185.php
IBM - Dublin Research Lab
Relevance
• The “perfect” Irish weather ☺
• Choosing the best city activity
depends on the weather
– Indoor vs. Outdoor
IBM - Dublin Research Lab
Information Sources
Web Context City ContextWeb documents Structured City Data
User Queries City-specific Web Documents
Clickthrough Data Sensor Data
Hyperlinks Social Media, Check-ins
Road Network
Transportation Data
City Events
Regional Information
Municipality
Crime and safety and much more…
IBM - Dublin Research Lab
Wrap Up
•Majority of World population live in cities
•Cities are dynamic entities combining people,
systems, infrastructure, businesses
•More and more city data becomes available
enabling more insight
•City data is heterogeneous, multi-domain,
noisy and big
Cities and City Data
•Managing City Data
• Characteristics and types of city data
• Semantic Processing and lifting
• Demos
•Searching City Data
• Challenges for IR community
• A review of related approaches
• Future directions
What is next in this tutorial?
•City search as an evolution of search into city
context
•Characterized by specific information needs of
people in everyday city life
•Drift in search relevance from Web context to
the city context
•Drift in information sources used to drive
search process
City Search
IBM - Dublin Research Lab
References
• Marty Himmelstein, Local search: The internet is the yellow pages, IEEE Computer, 2005
• Klaus Berberich, Arnd C. Konig, Dimitrios Lymberopoulos, Peixiang Zhao, Improving local search ranking through external logs, SIGIR 2011.
• Hannu Kukka, Vassilis Kostakos, Timo Ojala, Johanna Ylipulli, Tiina Suopajarvi, Marko Jurmu, Simo Hosio, This is not classified: everyday information seeking and encountering in smart urban spaces, Personal and Ubiquitous Computing, 2013
• Spink, A., Wolfram, D., Jansen, M. B., & Saracevic, T. (2001). Searching the web: The public and their queries. Journal of the American society for information science and technology, 52(3), 226-234.
• Zhang, Wei Vivian, Benjamin Rey, Eugene Stipp, and Rosie Jones. Geomodificationin Query Rewriting. In GIR. 2006.