TWC LOGD: A Portal for Linking Open Government Data Li Ding, Deborah L. McGuinness, Jim Hendler...

Post on 01-Jan-2016

213 views 1 download

Tags:

transcript

TWC LOGD: A Portal for Linking Open

Government Data

Li Ding, Deborah L. McGuinness, Jim Hendler

Tetherless World ConstellationRensselaer Polytechnic Institute

Presented by Li Ding at Northwestern UniversityDec 1, 2010

2

The TWC LOGD Portal Highlights

Real World Data US, UK, China,… Health, energy, economy

End User Applications Community Portal Fast, Low-cost Mashups

Applied Semantic Web Major partner of Data.gov 8.5 billion triples in LOD

Semantic Web Deployed at Data.gov

http://www.data.gov/semantic

4Data.gov and World-Wide Open Government Data Activities

Jan

ua

ry 1

, 2

00

9

“Openness will strengthen our democracy and promote efficiency and effectiveness in Government.”

--- President Obama

Putting Government Data online

Ma

y 2

1,

20

09

Jan

ua

ry 1

9,

20

10

data.gov.uk online

Ma

y 2

1,

20

10

data.gov online data.gov relaunchwith semantic webfeatured

Jun

e3

0,2

00

9

2009 2010 …

Many countries• US• UK• Australia• New Zealand …

5

First anniversary of Data.gov

Semantic Web and RDF logo showed up on the frontpage of the US Data.gov website

6

Semantic Web deployed at Data.gov: RDF data, SPARQL endpoint, semantic mashups

7

RPI featured as a major partner of the US Data.gov project

8

Government Adoption Process

Data-gov Wiki@RPIonline

Ma

y 2

1,

20

09

Ma

y,

20

10

data.gov online SPARQL End Point& RDF data& DemosReplicated atData.gov

July

,20

09

2009 2010 …DemosTutorialsVideosSPARQL Endpoint2

00

9-

20

10

Oct

, 2

01

0

New Applicationpublished bya team at DOE

Two-day Mashathon in Washington DC

Au

g,

20

10

Ma

y 2

1,

20

10

data.gov relaunchwith semantic webfeatured

TWC LOGD Drupal Siteannounced

Oct

, 2

01

0

The Largest Real World LOD Datasethttp://logd.tw.rpi.edu/twc-logd

Categories of Data.gov Datasets

Statistical data about various aspect of society Over 3000 Datasets

Raw Government Data Now

Metadata in PDF

Data in Excel

Conversion: From Raw Tabular Data to RDF

Enhancement: Linking Open Government Data

ID year PHSY_ST site-id cost

1998 10.0

1999 site123 11.3

2000 NY 8.3

2001 20

site-id Latitude longitude

site123 43.993 -70.326

Year claims

2000 382

PHSY_ST: state abbreviationID: unique id

cost: unit is million US dollarsyear: 1975-2008

Correlated dataset Complement dataset

Metadata (field definition) Metadata (value definition)

owl:sameAs

DS123:NY

14

The Largest Real World LOD Dataset

8.5+ billion triples from real world 7500+ LOD links Accessible via Data Browser, e.g. Tabulator

Consuming Linked Open Government Data

http://logd.tw.rpi.edu/demos

LOGD Application UI

TWC LOGDdata.gov.uk

dbpedia

W W W S

PA

RQ

LQ

uery

SP

AR

QL

Results

Format Data

JSO

N

XM

L

CS

V

Visualize DataQuery Data Integrate Data

LOGD Consumption Workflow

Exhibit Visualization API

Data.govData.gov

CASTNET Ozone(CSV)

epa.govepa.gov

CASTNET Site(CSV)

Convert raw dataset into linkable RDF

Data Mashup Web Application MashupVisualization Mashup

query multiple RDF dataset via SPARQL end point

surf to EPA applications

1

2

drill down for details3

4

Created by Dominic DiFranzo, PhD student at RPI, http://www.data.gov/semantic/Castnet/html/exhibit

Mashing up LOGD Data

18

Trends in Smoking Prevalence, Tobacco Policy Coverage and Tobacco Prices (1991-2007)

Smoking Prevalence vs. Tax, Policy …Extensible and accountable Mashups with NCI

Extensible Mashups via Linked Data Diverse datasets from NIH Potentially linking to “unemployment rate”Accountable Mashups via Provenance Annotate datasets used in demos Feedback users’ comment to gov contact (e.g. %)

Created by Li Ding, Tim Lebo, RPI, http://logd.tw.rpi.edu/project/popscigrid

Smoking Prevalence vs. Other Factors Integrating different sources for discovery

Created by Sarah Magidson, U. Chicago. http://data-gov.tw.rpi.edu/demo/stable/tobacco-smoker/demo-state-10026-smoke-rate-statevarsapi.html

[Spatial Mashup] Data.gov (Population) + NIH (Tobacco Tax, Smoking rate)

Gov data provides knowledge for poplation science study

20Linking GDP of the US and ChinaLinking international government data meaningfully

GDP of China (Billion Chinese Yuan )

GDP of the US (Billion Dollar)

[Temporal Mashup] bea.gov + federalreserve.gov +stats.gov.cn

8.3

6.3

2000 2010

Created by Li Ding, RPI, http://logd.tw.rpi.edu/demo/linking_us_and_chinas_gdp_data/

Adding Social Factor to Mashups

RDFPublish*

Enhance*

UserRawData

consume*

feedback

• Import socially contributed data, e.g. DBpedia • Let users contribute

– links– feedbacks

OtherSocial Web

AppsImport/export

Wildland fire(NIFC)

Budget on wildfire“DOI” and “USDA”(OMB)

Category:Wildfires In The United States

Created by Li Ding, RPI, http://data-gov.tw.rpi.edu/demo/stable/demo-1187-40x-wildfire-budget.html

[Temporal Mashup] Data.gov (statistics+ budget) + Wikipedia (famous fires)

US Wildland Fire and BudgetLinking to Wikipedia (socially contributed)

24White House Visitor SearchLeveraging linked data (DBpedia & New York Times)

“POTUS”

dbpedia:Barack_Obama

Created by Dominic DiFranzo, Evan Patton, RPI, http://data-gov.tw.rpi.edu/demo/stable/white-house-visitor/top100-visitees.php

[Person Mashup] Data.gov (statistics) + DBpedia (personal profiles)+ NYTimes (news) [Technologies] Semantic MediaWiki, Google Visualization, IPad Apps available in Apple Store

The White House

Semantic Wiki

WikipediaNYTimes

Created by Sarah Magidson, http://data-gov.tw.rpi.edu/demo/linked/demo-401-usps-news.html

[Temporal Mashup] Data.gov (budget) + USPS + User Contributed News

USPS Spending and Newsgovernment data + User Feedbacks

Current Status of TWC LOGD

http://data-gov.tw.rpi.edu => http://logd.tw.rpi.edu

(Semantic MediaWiki) (Drupal + RDFa)

27

Website Statistics

• 378,128 page hits • 28,481 visits• 16,041 visitors• 4126 cities• 34 countries

Note: the above statistics are about http://data-gov.tw.rpi.edu. Dataset access not counted.

28

Dataset

Version

TableSource

Record

ConversionLayer

OGD (part1)Snapshot

LOGD(raw)

LOGD(e1)

high Levels of structural data granularity low

Dat

a pu

blis

hing

sta

ges

OGD (part2)Snapshot

Data Abstraction and Versioning

29

Provenance and Workflow

Convert

derive derive

create

derive

revision

Access

Enhance

Version

SemDiff

30Linking Open Source CommunityLinking semantic web with web developers

• Social Semantic Web extensions/modules to popular CMS, e.g. Semantic Wiki, Drupal

• Process/consume integrated gov data in a number of different ways: social networks, natural language technologies, workflows, search…

Education: Linked Tutorials, Demos…

project

demo

technology

tutorial

video

dataset

source

person

dcterms:contributor

logd:uses_dataset

logd:uses_technology

logd:uses_datasource

dcterms:relation

dcterms:relation

dcterms:relation

dcterms:source

http://logd.tw.rpi.edu/tutorials

32

Summary of the TWC LOGD Portal

Real World Data8.5+ billion triples 400+ datasets10+ sources Many domainsSemantic Web Technology completely open source Demos/tutorials/videosCommunity and Users partner of US government open source community education in university

http://logd.tw.rpi.edu

Beyond just dogfood;Linking Open Government Data Now!

The Team and Sponsors

• Leaders– Jim Hendler– Deborah L. McGuinness– Li Ding

• Members– Dominic DiFranzo– Sarah Magidson– James Michaelis – Alvaro Graves– Jin Guang Zheng – Xian Li– Gregory Todd Williams– Tim Lebo– Zhenning Shangguan– Devin Gaffney– Peter Coons– Adam Bell– William Cooper – Brian Zaik– Johanna Flores

33

Government Sponsors

DARPANSFNASAIARPANIH/NCI…