TWC LOGD: A Portal for Linking Open
Government Data
Li Ding, Deborah L. McGuinness, Jim Hendler
Tetherless World ConstellationRensselaer Polytechnic Institute
Presented by Li Ding at Northwestern UniversityDec 1, 2010
2
The TWC LOGD Portal Highlights
Real World Data US, UK, China,… Health, energy, economy
End User Applications Community Portal Fast, Low-cost Mashups
Applied Semantic Web Major partner of Data.gov 8.5 billion triples in LOD
4Data.gov and World-Wide Open Government Data Activities
Jan
ua
ry 1
, 2
00
9
“Openness will strengthen our democracy and promote efficiency and effectiveness in Government.”
--- President Obama
Putting Government Data online
Ma
y 2
1,
20
09
Jan
ua
ry 1
9,
20
10
data.gov.uk online
Ma
y 2
1,
20
10
data.gov online data.gov relaunchwith semantic webfeatured
Jun
e3
0,2
00
9
2009 2010 …
Many countries• US• UK• Australia• New Zealand …
5
First anniversary of Data.gov
Semantic Web and RDF logo showed up on the frontpage of the US Data.gov website
6
Semantic Web deployed at Data.gov: RDF data, SPARQL endpoint, semantic mashups
7
RPI featured as a major partner of the US Data.gov project
8
Government Adoption Process
Data-gov Wiki@RPIonline
Ma
y 2
1,
20
09
Ma
y,
20
10
data.gov online SPARQL End Point& RDF data& DemosReplicated atData.gov
July
,20
09
2009 2010 …DemosTutorialsVideosSPARQL Endpoint2
00
9-
20
10
Oct
, 2
01
0
New Applicationpublished bya team at DOE
Two-day Mashathon in Washington DC
Au
g,
20
10
Ma
y 2
1,
20
10
data.gov relaunchwith semantic webfeatured
TWC LOGD Drupal Siteannounced
Oct
, 2
01
0
Categories of Data.gov Datasets
Statistical data about various aspect of society Over 3000 Datasets
Raw Government Data Now
Metadata in PDF
Data in Excel
Conversion: From Raw Tabular Data to RDF
Enhancement: Linking Open Government Data
ID year PHSY_ST site-id cost
1998 10.0
1999 site123 11.3
2000 NY 8.3
2001 20
site-id Latitude longitude
site123 43.993 -70.326
Year claims
2000 382
PHSY_ST: state abbreviationID: unique id
cost: unit is million US dollarsyear: 1975-2008
Correlated dataset Complement dataset
Metadata (field definition) Metadata (value definition)
owl:sameAs
DS123:NY
14
The Largest Real World LOD Dataset
8.5+ billion triples from real world 7500+ LOD links Accessible via Data Browser, e.g. Tabulator
LOGD Application UI
TWC LOGDdata.gov.uk
dbpedia
W W W S
PA
RQ
LQ
uery
SP
AR
QL
Results
Format Data
JSO
N
XM
L
CS
V
Visualize DataQuery Data Integrate Data
LOGD Consumption Workflow
Exhibit Visualization API
Data.govData.gov
CASTNET Ozone(CSV)
epa.govepa.gov
CASTNET Site(CSV)
Convert raw dataset into linkable RDF
Data Mashup Web Application MashupVisualization Mashup
query multiple RDF dataset via SPARQL end point
surf to EPA applications
1
2
drill down for details3
4
Created by Dominic DiFranzo, PhD student at RPI, http://www.data.gov/semantic/Castnet/html/exhibit
Mashing up LOGD Data
18
Trends in Smoking Prevalence, Tobacco Policy Coverage and Tobacco Prices (1991-2007)
Smoking Prevalence vs. Tax, Policy …Extensible and accountable Mashups with NCI
Extensible Mashups via Linked Data Diverse datasets from NIH Potentially linking to “unemployment rate”Accountable Mashups via Provenance Annotate datasets used in demos Feedback users’ comment to gov contact (e.g. %)
Created by Li Ding, Tim Lebo, RPI, http://logd.tw.rpi.edu/project/popscigrid
Smoking Prevalence vs. Other Factors Integrating different sources for discovery
Created by Sarah Magidson, U. Chicago. http://data-gov.tw.rpi.edu/demo/stable/tobacco-smoker/demo-state-10026-smoke-rate-statevarsapi.html
[Spatial Mashup] Data.gov (Population) + NIH (Tobacco Tax, Smoking rate)
Gov data provides knowledge for poplation science study
20Linking GDP of the US and ChinaLinking international government data meaningfully
GDP of China (Billion Chinese Yuan )
GDP of the US (Billion Dollar)
[Temporal Mashup] bea.gov + federalreserve.gov +stats.gov.cn
8.3
6.3
2000 2010
Created by Li Ding, RPI, http://logd.tw.rpi.edu/demo/linking_us_and_chinas_gdp_data/
21
XHTML+RDFa
ARC2
http://data-gov.tw.rpi.edu/
Semantic Search on LOGD datarich snippet in results
Web Search(HTML)
Rich Snippet(RDFa)
Adding Social Factor to Mashups
RDFPublish*
Enhance*
UserRawData
consume*
feedback
• Import socially contributed data, e.g. DBpedia • Let users contribute
– links– feedbacks
OtherSocial Web
AppsImport/export
Wildland fire(NIFC)
Budget on wildfire“DOI” and “USDA”(OMB)
Category:Wildfires In The United States
Created by Li Ding, RPI, http://data-gov.tw.rpi.edu/demo/stable/demo-1187-40x-wildfire-budget.html
[Temporal Mashup] Data.gov (statistics+ budget) + Wikipedia (famous fires)
US Wildland Fire and BudgetLinking to Wikipedia (socially contributed)
24White House Visitor SearchLeveraging linked data (DBpedia & New York Times)
“POTUS”
dbpedia:Barack_Obama
Created by Dominic DiFranzo, Evan Patton, RPI, http://data-gov.tw.rpi.edu/demo/stable/white-house-visitor/top100-visitees.php
[Person Mashup] Data.gov (statistics) + DBpedia (personal profiles)+ NYTimes (news) [Technologies] Semantic MediaWiki, Google Visualization, IPad Apps available in Apple Store
The White House
Semantic Wiki
WikipediaNYTimes
Created by Sarah Magidson, http://data-gov.tw.rpi.edu/demo/linked/demo-401-usps-news.html
[Temporal Mashup] Data.gov (budget) + USPS + User Contributed News
USPS Spending and Newsgovernment data + User Feedbacks
Current Status of TWC LOGD
http://data-gov.tw.rpi.edu => http://logd.tw.rpi.edu
(Semantic MediaWiki) (Drupal + RDFa)
27
Website Statistics
• 378,128 page hits • 28,481 visits• 16,041 visitors• 4126 cities• 34 countries
Note: the above statistics are about http://data-gov.tw.rpi.edu. Dataset access not counted.
28
Dataset
Version
TableSource
Record
ConversionLayer
OGD (part1)Snapshot
LOGD(raw)
LOGD(e1)
high Levels of structural data granularity low
Dat
a pu
blis
hing
sta
ges
OGD (part2)Snapshot
…
…
…
Data Abstraction and Versioning
29
Provenance and Workflow
Convert
derive derive
create
derive
revision
Access
Enhance
Version
SemDiff
30Linking Open Source CommunityLinking semantic web with web developers
• Social Semantic Web extensions/modules to popular CMS, e.g. Semantic Wiki, Drupal
• Process/consume integrated gov data in a number of different ways: social networks, natural language technologies, workflows, search…
Education: Linked Tutorials, Demos…
project
demo
technology
tutorial
video
dataset
source
person
dcterms:contributor
logd:uses_dataset
logd:uses_technology
logd:uses_datasource
dcterms:relation
dcterms:relation
dcterms:relation
dcterms:source
http://logd.tw.rpi.edu/tutorials
32
Summary of the TWC LOGD Portal
Real World Data8.5+ billion triples 400+ datasets10+ sources Many domainsSemantic Web Technology completely open source Demos/tutorials/videosCommunity and Users partner of US government open source community education in university
http://logd.tw.rpi.edu
Beyond just dogfood;Linking Open Government Data Now!
The Team and Sponsors
• Leaders– Jim Hendler– Deborah L. McGuinness– Li Ding
• Members– Dominic DiFranzo– Sarah Magidson– James Michaelis – Alvaro Graves– Jin Guang Zheng – Xian Li– Gregory Todd Williams– Tim Lebo– Zhenning Shangguan– Devin Gaffney– Peter Coons– Adam Bell– William Cooper – Brian Zaik– Johanna Flores
33
Government Sponsors
DARPANSFNASAIARPANIH/NCI…