+ All Categories
Home > Documents > Data.gov Wiki: A Semantic Web Approach to Government Data Li Ding, Dominic DiFranzo, Sarah Magidson,...

Data.gov Wiki: A Semantic Web Approach to Government Data Li Ding, Dominic DiFranzo, Sarah Magidson,...

Date post: 20-Dec-2015
Category:
View: 216 times
Download: 1 times
Share this document with a friend
Popular Tags:
11
Data.gov Wiki: A Semantic Web Approach to Government Data Li Ding, Dominic DiFranzo, Sarah Magidson, Jim Hendler Tetherless World Constellation Aug 7, 2009
Transcript
Page 1: Data.gov Wiki: A Semantic Web Approach to Government Data Li Ding, Dominic DiFranzo, Sarah Magidson, Jim Hendler Tetherless World Constellation Aug 7,

Data.gov Wiki: A Semantic Web Approach to

Government Data

Li Ding, Dominic DiFranzo, Sarah Magidson, Jim Hendler

Tetherless World ConstellationAug 7, 2009

Page 2: Data.gov Wiki: A Semantic Web Approach to Government Data Li Ding, Dominic DiFranzo, Sarah Magidson, Jim Hendler Tetherless World Constellation Aug 7,

Government Data on the Web

Page 3: Data.gov Wiki: A Semantic Web Approach to Government Data Li Ding, Dominic DiFranzo, Sarah Magidson, Jim Hendler Tetherless World Constellation Aug 7,

Objectives

• Investigate the role of semantic web in producing, processing and utilizing government datasets– To enrich the value of data via normalizing,

linking and information-extraction– To realize the value of data via applications,

esp. visualization– To support web developers via machine

friendly data access and web services

Page 4: Data.gov Wiki: A Semantic Web Approach to Government Data Li Ding, Dominic DiFranzo, Sarah Magidson, Jim Hendler Tetherless World Constellation Aug 7,

Data Processors(Web Services & Analyzers)Data Processors(Web Services & Analyzers)

SPARQL Web Service

XSLT Service Diff Service

RDF/XML

RSS Generator

SPARQL End Point

Linked Data

Linked DataGOV data

(RDF)

Google Viz MIT Exhibit RSS 1.0 tagCloud

CSVXSL…

Tabulator

Convert D

ataLink &

Enrich D

ataV

iew &

Use D

ata

Link Annotator

RDF/XML

Li Ding, Dominic DiFranzo, Sarah Magidson, and Jim Hendler · Tetherless World Constellation · Rensselaer Polytechnic Institute · Aug 7 2009 · http://data-gov.tw.rpi.edu/

Sem Wiki

Semantic Web Architecture for Government Data

Page 5: Data.gov Wiki: A Semantic Web Approach to Government Data Li Ding, Dominic DiFranzo, Sarah Magidson, Jim Hendler Tetherless World Constellation Aug 7,

Translate GOV data into RDF

• Principle 1: Keep the translation minimal – keep table structure– skip parsing values, unique property namespace

• Principle 2: Let the translation meet the Web– RDF/XML as output– Partition of big dataset, dereferenable URI

• Principle 3: Make the translation extensible– Property definition updatable via Semantic MediaWiki

• Principle 4: Preserve knowledge provenance– Recording provenance metadata using DC and FOAF

Dominic

Page 6: Data.gov Wiki: A Semantic Web Approach to Government Data Li Ding, Dominic DiFranzo, Sarah Magidson, Jim Hendler Tetherless World Constellation Aug 7,

Translated Dataset Statistics

• data.gov hosts 432 Datasets: – 390 “Raw Data Catalog” and 41

“Tool Catalog”– from 37 US government agencies

• We have 16 translated RDF datasets

– 13,532,385 table entries – 2,927,399,269 triples. – 2,526 properties.

• data.gov mentioned 458 data access points (mainly tables)

– 3 - RSS,ATOM– 248 - csv/txt– 46 – xml– 66 - xls (MS Excel) – 14 - kml or kmz– 22 ESRI shape

Page 7: Data.gov Wiki: A Semantic Web Approach to Government Data Li Ding, Dominic DiFranzo, Sarah Magidson, Jim Hendler Tetherless World Constellation Aug 7,

(#10) Residential Energy Consumption Survey

(#401) Budget Authority and

offsetting receipts1976-2014

(#403) Governmental

Receipts1962-2014

(#402) Outlays and

offsetting receipts1962-2014

(#249) 2006 Toxics Release

Inventory

(#90) 2005-2007 ACS PUMS

Housing (#191) 2005 Toxics Release

Inventory

(#91) 2005-2007 ACS PUMS Population

(#34) Worldwide M1+

Earthquakes past 7 days

(#9) CASTNET Visibility

(#397) 2007 Toxics Release

Inventory

(#8) CASTNET Ozone

Budget

Population

Energy and Utilities

Geography and Environment

(@10001)CASTNET sites

Cloud of government data

Li Ding, Dominic DiFranzo, Sarah Magidson, and Jim Hendler · Tetherless World Constellation · Rensselaer Polytechnic Institute · Aug 7 2009 · http://data-gov.tw.rpi.edu/

Page 8: Data.gov Wiki: A Semantic Web Approach to Government Data Li Ding, Dominic DiFranzo, Sarah Magidson, Jim Hendler Tetherless World Constellation Aug 7,

Issues in Data.gov

• Duplicated Datasets- Some datasets are part of another dataset

– Dataset 140 (2005 Toxics Release Inventory data for the state of California (EPA)) is a subset of Dataset 191.

• Formatting Issues - The format of some datasets is not friendly to machine processing.

– Dataset 37 (Lower Colorado River Daily Average Water Elevations and Releases (US Bureau of Reclamation)).

– Dataset 335 (National Longitudinal Surveys (US Bureau of Labor Statistics)) tells you how to order data from the government.

• Access Point Issues - The access points are interactive webpage which is not friendly for machine access.

– Dataset 330 (Local Area Unemployment Statistics (US Bureau of Labor Statistics)

Sarah

Page 9: Data.gov Wiki: A Semantic Web Approach to Government Data Li Ding, Dominic DiFranzo, Sarah Magidson, Jim Hendler Tetherless World Constellation Aug 7,

Demos

• Visualization– Tabulator– Google Visualization (live)– Exhibit (live)

• Computation– RSS generation– TDB query (live)

• Live Demos: – http://onto.rpi.edu/joseki/ – http://data-gov.tw.rpi.edu/wiki/Demos

Dominic, Sarah

Page 10: Data.gov Wiki: A Semantic Web Approach to Government Data Li Ding, Dominic DiFranzo, Sarah Magidson, Jim Hendler Tetherless World Constellation Aug 7,

TODO List

• More demos– US Pollution Map– US agency– Earthquake in RPI Map

• Getting more data linked– Link properties– Link instance data

• More web services– Gov data auto-completion

• SPARQL integration for 2B triples– TDB– 4Store

(#9) CASTNET Visibility

(#8) CASTNET Ozone

(@10001)CASTNET sites

Page 11: Data.gov Wiki: A Semantic Web Approach to Government Data Li Ding, Dominic DiFranzo, Sarah Magidson, Jim Hendler Tetherless World Constellation Aug 7,

Sample SPARQL queries

• List datasets: – SELECT ?s ?o WHERE {?s <http://purl.org/dc/elements/1.1/source> ?o }

• List all loaded documents: – SELECT ?s ?o WHERE {?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>

<http://xmlns.com/foaf/0.1/Document> }• List description about a EPA site (integration)

– select ?s WHERE {?s <http://data-gov.tw.rpi.edu/vocab/p/8/site_id> "SHN418". }• List contributions of agency (count)

– PREFIX dgp92: <http://data-gov.tw.rpi.edu/vocab/p/92/> SELECT ?ag count(*) WHERE { ?entry dgp92:agency ?ag. } GROUP BY ?ag ORDER BY ?ag

• List agencies (distinct)– PREFIX dgp401: <http://data-gov.tw.rpi.edu/vocab/p/401/> SELECT distinct ?ag

?ag_code ?branch ?branch_code WHERE { ?entry dgp401:bureau_name ?ag; dgp401:bureau_code ?ag_code; dgp401:agency_name ?branch; dgp401:agency_code ?branch_code . } ORDER BY ?ag


Recommended