Date post: | 20-Dec-2015 |
Category: |
Documents |
View: | 216 times |
Download: | 1 times |
Data.gov Wiki: A Semantic Web Approach to
Government Data
Li Ding, Dominic DiFranzo, Sarah Magidson, Jim Hendler
Tetherless World ConstellationAug 7, 2009
Government Data on the Web
Objectives
• Investigate the role of semantic web in producing, processing and utilizing government datasets– To enrich the value of data via normalizing,
linking and information-extraction– To realize the value of data via applications,
esp. visualization– To support web developers via machine
friendly data access and web services
Data Processors(Web Services & Analyzers)Data Processors(Web Services & Analyzers)
SPARQL Web Service
XSLT Service Diff Service
RDF/XML
RSS Generator
SPARQL End Point
Linked Data
Linked DataGOV data
(RDF)
Google Viz MIT Exhibit RSS 1.0 tagCloud
…
CSVXSL…
Tabulator
Convert D
ataLink &
Enrich D
ataV
iew &
Use D
ata
Link Annotator
RDF/XML
Li Ding, Dominic DiFranzo, Sarah Magidson, and Jim Hendler · Tetherless World Constellation · Rensselaer Polytechnic Institute · Aug 7 2009 · http://data-gov.tw.rpi.edu/
Sem Wiki
Semantic Web Architecture for Government Data
Translate GOV data into RDF
• Principle 1: Keep the translation minimal – keep table structure– skip parsing values, unique property namespace
• Principle 2: Let the translation meet the Web– RDF/XML as output– Partition of big dataset, dereferenable URI
• Principle 3: Make the translation extensible– Property definition updatable via Semantic MediaWiki
• Principle 4: Preserve knowledge provenance– Recording provenance metadata using DC and FOAF
Dominic
Translated Dataset Statistics
• data.gov hosts 432 Datasets: – 390 “Raw Data Catalog” and 41
“Tool Catalog”– from 37 US government agencies
• We have 16 translated RDF datasets
– 13,532,385 table entries – 2,927,399,269 triples. – 2,526 properties.
• data.gov mentioned 458 data access points (mainly tables)
– 3 - RSS,ATOM– 248 - csv/txt– 46 – xml– 66 - xls (MS Excel) – 14 - kml or kmz– 22 ESRI shape
(#10) Residential Energy Consumption Survey
(#401) Budget Authority and
offsetting receipts1976-2014
(#403) Governmental
Receipts1962-2014
(#402) Outlays and
offsetting receipts1962-2014
(#249) 2006 Toxics Release
Inventory
(#90) 2005-2007 ACS PUMS
Housing (#191) 2005 Toxics Release
Inventory
(#91) 2005-2007 ACS PUMS Population
(#34) Worldwide M1+
Earthquakes past 7 days
(#9) CASTNET Visibility
(#397) 2007 Toxics Release
Inventory
(#8) CASTNET Ozone
Budget
Population
Energy and Utilities
Geography and Environment
(@10001)CASTNET sites
Cloud of government data
Li Ding, Dominic DiFranzo, Sarah Magidson, and Jim Hendler · Tetherless World Constellation · Rensselaer Polytechnic Institute · Aug 7 2009 · http://data-gov.tw.rpi.edu/
Issues in Data.gov
• Duplicated Datasets- Some datasets are part of another dataset
– Dataset 140 (2005 Toxics Release Inventory data for the state of California (EPA)) is a subset of Dataset 191.
• Formatting Issues - The format of some datasets is not friendly to machine processing.
– Dataset 37 (Lower Colorado River Daily Average Water Elevations and Releases (US Bureau of Reclamation)).
– Dataset 335 (National Longitudinal Surveys (US Bureau of Labor Statistics)) tells you how to order data from the government.
• Access Point Issues - The access points are interactive webpage which is not friendly for machine access.
– Dataset 330 (Local Area Unemployment Statistics (US Bureau of Labor Statistics)
Sarah
Demos
• Visualization– Tabulator– Google Visualization (live)– Exhibit (live)
• Computation– RSS generation– TDB query (live)
• Live Demos: – http://onto.rpi.edu/joseki/ – http://data-gov.tw.rpi.edu/wiki/Demos
Dominic, Sarah
TODO List
• More demos– US Pollution Map– US agency– Earthquake in RPI Map
• Getting more data linked– Link properties– Link instance data
• More web services– Gov data auto-completion
• SPARQL integration for 2B triples– TDB– 4Store
(#9) CASTNET Visibility
(#8) CASTNET Ozone
(@10001)CASTNET sites
Sample SPARQL queries
• List datasets: – SELECT ?s ?o WHERE {?s <http://purl.org/dc/elements/1.1/source> ?o }
• List all loaded documents: – SELECT ?s ?o WHERE {?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://xmlns.com/foaf/0.1/Document> }• List description about a EPA site (integration)
– select ?s WHERE {?s <http://data-gov.tw.rpi.edu/vocab/p/8/site_id> "SHN418". }• List contributions of agency (count)
– PREFIX dgp92: <http://data-gov.tw.rpi.edu/vocab/p/92/> SELECT ?ag count(*) WHERE { ?entry dgp92:agency ?ag. } GROUP BY ?ag ORDER BY ?ag
• List agencies (distinct)– PREFIX dgp401: <http://data-gov.tw.rpi.edu/vocab/p/401/> SELECT distinct ?ag
?ag_code ?branch ?branch_code WHERE { ?entry dgp401:bureau_name ?ag; dgp401:bureau_code ?ag_code; dgp401:agency_name ?branch; dgp401:agency_code ?branch_code . } ORDER BY ?ag