Open Data – reflections from behind the Big Firewall
Or, may you be cursed to live in interesting times
Open Data …. Why bother?
Open Contributed Content will become a core, strategic, economic resource – and the most accessible & scalable resource we possess.
Mobility, Openness & Connection will matter more than Presence & Rigid Structures
In 2013 expect generation of >850 Exabytes of Internet data.
Mostly user contributed content (versus traditional enterprise sources).
Global access to technology is already driving trends like ‘virtual citizenship’, ‘virtual employment’ & ‘social innovation’
On-demand interaction will increasingly be the norm for a global community of virtual innovators … who expect their user experience to be as simple as ‘using an appliance’
Open Data and Economicsor …. ‘Greater Fool Investing’ …..!!’
Open data is a potential new 'raw material' for economic growth. It requires effort to produce and maintain.
Unlike traditional raw materials like oil, gas and minerals, its value increases fastest when it is open and shareable.
Bubble … "trade in high volumes at prices that are considerably at variance with intrinsic values".
Open Data alone does not generate direct economic benefit sufficient to offset production & operational costs … the question is … can it generate sufficient ‘value’ to be sustainable?
Incentives must be in place to sustain “economically significant” amounts of Open Data
Some bright lights … but we need answers before we run out of steam!!
How Private is Private?
Privacy is not absolute, it is a balance between Risk and Utility
Open Data usage is inherently contradictory• Social media usage -> Maximize Utility + (Largely) Ignore Risk• Enterprise usage -> Maximize Utility + Minimize Risk
Who carries liability in case of dispute? Uncertainty in usage policies is a substantial form of business risk
Recognize in policy and legislation that privacy is mutable - based on context Available Open Data useful to identify & characterize group behaviors✔✖Negative usage for ‘nuisance’ providers to identify high-value targets
{ (high value residences)} ∩ { (long emergency response time)} ∩ { (many local area crimes)}∃ ∃ ∃ {area where people might buy home security products}
(all available on open data sites near you)
A Fun Use Case
Challenges for Privacy in an Open Data World
And I haven’t even mentioned Trust, Provenance, Security, ……
• Data– 100’s of datasets, 1000’s of files– Very open domain(s)– Very expensive to normalize– Scaling complexity from high dimensionality
• Approach– Pay-as-you go approach, only process what you need– Do not stick to a common model, use any you can find– Generate interesting views and feed them to “analytics”
• Lessons learned– Multiple models, depending on context– Need to do things incrementally– Lightweight generally better than heavyweight
Selected research results:-Live deployment in Dublin-Won prize in Semantic Web Challenge-Paper at ISWC-Paper at Hypertext-Invited paper at Journal of Web Semantics
Research impact: what we have learned so farThere are plenty of interesting challenges!!
Documents +Metadata
Structure Entities Links Views Insight
…. Pay-as-you-go, Gain-as-you-go
Dublinked - Towards a robust test-bed for Open Data Research
IBM Connections
Social Media & Collaboration
IBM IOC
Interaction with Industry Solutions
Dublin City
Enterprise Platform
IBM Enterprise Cloud
Scalable compute, storage & network infrastructure
Provider 1…N
Open REST Web Services API
Catalog & Navigation Search & Query
Privacy & Security
Knowledge Representation & Reasoning
Publication & Annotation
Visualization & Analytics
Enterprise CitizenIBM Products & Services
Robust models to organize and represent resources and their context
Scalable privacy and security of resources
Automated assimilation and sharing of resources
Compose resources for development, mash-up & visualization
Challenges include ..
IBM Research
Partners & People
Key
Represent knowledge efficiently for continuous machine reasoning and
diagnosis
Research Testbed
What we do: Learning Systems to Help Diagnose the City
ProblemHow can we provide City decision makers with explanations and diagnoses for events by applying machine reasoning techniques to a fusion of massive, rich, complex and dynamic data? How can we move from explanation to prediction?
Challenges• Identifying relevant data and information• Capturing and representing anomalies• Correlating knowledge on heterogeneous data sources• Advanced fusion of heterogeneous data from multiple sources
Goals• Identification of the nature and cause of changes• Explaining logical connection of knowledge across space and time• Move from explanation to prediction
Anomaly Detected:Delayed buses, congested roads
Detection to Diagnosis?
Outline Research Roadmap
2013
2014
2015Use Cases
Technology
•Provenance•Privacy•High-volume distributed querying•Wide-scale distributed querying•Distributed Entity Linking
•Fine-grain Access Control•Streaming Analytics•Distributed Reasoning•Context Mining
•Lightweight Distributed Information Access•Contextual Access•Basic Access Control•Distributed Entity Consolidation•Graph Access
•Linked Data Cloud Context Retrieval•Cross-agency Context Retrieval•Cross-agency Analytics
•Cross Web-Enterprise Analytics•Many-agency Analytics•Public Safety Integrator
•Life analytics (social/health/public safety)•High-risk/time-critical alerting•Cross-agency Alerting
Data
War
ehou
se
Dynamic Distributed Information Analytics