Vanessa lopez linked data and search

Date post: 12-Jan-2015
Dublinked Technical Workshop - Linked Data & Search by Vanessa Lopez
Linked Data and Search Vanessa Lopez Smarter Cities Technology Centre IBM Research Ireland
IBM Research – Ireland

© 2012 IBM Corporation

Linked Data and Search  

Vanessa Lopez Smarter Cities Technology Centre  

IBM Research Ireland  

IBM Research – Ireland

© 2012 IBM Corporation

Provides explicit semantics  


Interoperability-focused: to enable automatic discovery and ingestion  

Large existing corpora  

Fundamentally incremental (like the Web)  

W3C standard representation and common format  

Government push (e.g. data.gov, data.gov.uk, Linked Government Data)  

Background: Why Linked Data  

IBM Research – Ireland

© 2012 IBM Corporation

Yes, yes.. Richer structured queries but ..  


.. Limited usability for both data publishers and consumers    

IBM Research – Ireland

© 2012 IBM Corporation

How can we help users in querying and exploring the Semantic Web content?  


IBM Research – Ireland

© 2012 IBM Corporation

State of the art • Semantic search over messy, heterogeneous data and mash-ups  

• Exploratory and Faceted systems • Query Builders and relationship finders • Question Answer over Linked Data sources • Google knowledge graph http://technologies.kmi.open.ac.uk/poweraqua  

IBM Research – Ireland

© 2012 IBM Corporation

State of the art  

IBM Research – Ireland

© 2012 IBM Corporation

What makes City Data so special?  

How can we make it more accessible?    

Linked Data and Search - Problem domain:  

IBM Research – Ireland

© 2012 IBM Corporation

Semantic processing of urban data – why is different?  

• How can we go from raw data to insight into the operation of a city with minimal effort?  

Return-on-Investment (because data integration is expensive)  

Fit-for-all (citizen engagement)  

IBM Research – Ireland

© 2012 IBM Corporation

Challenges: Big city data Volume • Lots of relevant information  

• Not linked to authoritative sources  

Velocity • Streams • Frequent updates  

Variety • Different models and file formats  

• Open domain - Unknown schema  

Veracity • Diverse sources • Difficult to do assess quality  

IBM Research – Ireland

© 2012 IBM Corporation

Business case: open data as a means to an end  

IBM Research – Ireland

© 2012 IBM Corporation

• Why are ambulances late?  

Business case    

• 100's of datasets from four municipal authorities in Dublin • Most static, some dynamic  

• Social Media: twitter, LiveDrive, eventful, eventBright, … • Linked Data: DBpedia, .. • Vocabularies: IPSV, FOAF, VOID, PROV, DCAT, WSG  

Sources of information  

• Locations of Health Services • Ambulance call outs and response times • Tweets about traffic congestion • Geo-located tweets about people movement • Road network • Event Web Services • …  

Domain of information  

IBM Research – Ireland

© 2012 IBM Corporation


• Linked Data to enrich data and give contextual insight for publishers and consumers: – Publish (vocabularies, annotation) – Discovery and Search (metadata / cataloguing, full-text indexing, semantic entities)  

– Link (schema alignment, linked data, social media) – Extract interesting views – Reason (diagnose traffic problems)  


Ubiquitous aspects: Provenance, Governance, Performance, Security, Privacy  

IBM Research – Ireland

© 2012 IBM Corporation

Approach– Data model  

Documents + Metadata  

Structure Entities Links Views Insight  

Tabular Graph C1 a Cell C1 inRow r1 C1 value "name"  


Entity Graph e1 a Entity e1 inRow r1 e1 inCol c2  


Annotation Graph e1 a Entity e1 rdfs:label "name" e1 addr "X st" e1 lat :53.23" …  

Mapping Graph e1 a Entity e1 sameAs e2 …  

Pay-as-you-go, Gain-as-you-go  

• Structured metadata -> Queries over the metadata • Files into a standard representation -> Queries over the data. • Partially integrate schemata -> Queries across datasets. • Integrate globally -> Queries across Web data  

IBM Research – Ireland

© 2012 IBM Corporation

Discovery: Publishing and Cataloguing  

• METADATA – Many data publishers and disconnected datasets – Link metadata using domain vocabularies: IPSV – Convert to simple RDF format  


Vocabulary matching  


IBM Research – Ireland

© 2012 IBM Corporation

IBM Research – Ireland

© 2012 IBM Corporation

Search and linking  

Mining descriptions  

Full text indexing  

Entity linking  

Open metadata  

• Full text indexing for search over metadata and content • Entity linking and navigation (keywords, categories, publishing agencies, regions,..)  

• Open metadata and vocabularies (VOID, PROV, etc) for data discovery and linking  

• Mining descriptions (Dbpedia spotlight)    

IBM Research – Ireland

© 2012 IBM Corporation

Faceted search: "beaches in Fingal"  

IBM Research – Ireland

© 2012 IBM Corporation

IBM Research – Ireland

© 2012 IBM Corporation

Content integration • Incrementally lift data content (beyond search to querying across datasets content) – Extract entities represented in RDF (PAYGO) – Label extraction and annotation – Link when we have higher confidence (lat, long) – Geo-coding and taxonomy of tweets (traffic)  

Minimal Entry cost Provenance-based dataset ranking  

Geocoding Label extraction  

IBM Research – Ireland

© 2012 IBM Corporation

Views • Beyond search to guiding the user to create meaningful views: – Guide the users to annotate data, recommend related datasets and create dataviews on the fly  

– Ranking and context-based recommendations – Allow semantic based analysis on multiple views Hidden information discovery  

Multiple endpoints  

Cross domain queries  

Multiple interpretations  

IBM Research – Ireland

© 2012 IBM Corporation


• Currently: Web services and technology demonstrator  

• Next: Open RDF-based data management deployed in Dublin City (read/write). Deployment of traffic diagnoser.  

• SPUD: Semantic Processing of Urban Data (2nd prize at the Semantic Web Challenge – ISWC)  

• Live demo: www.dublinked.ie/sandbox/SemanticWebChall Spyros Kotoulas, Vanessa Lopez, Raymond Lloyd, Marco Luca Sbodio, Freddy Lecue, Martin Stephenson, Elizabeth Daly, Veli Bicer, Aris Gkoulalas-Divanis, Giusy Di Lorenzo, Anika Schumann, Denis Patterson, and Pol Mac Aonghusa    


IBM Research – Ireland

© 2012 IBM Corporation

Thank you!  

 • QuerioCity: A Linked Data Platform for Urban Information Management  

V. Lopez, S. Kotoulas, M. L. Sbodio, M. Stephenson, A. Gkoulalas-Divanis, P. Mac Aonghusa. In Use track at the 11th International Semantic Web Conference (ISWC).  

Reference Publication:  

City Fabric Team:  

