Linking Open Government Data at Scale

Post on 12-Apr-2017

59 views 0 download

transcript

Extend Your Reach.

Linking Open Government Data at Scale

YOW! 2016 Conference Melbourne December 1-2 ~ Brisbane December 5-6

Sydney December 8-9

Bernadette HylandCEO & co-founder3 Round Stones, Inc.@BernHylandbhyland@3RoundStones.com

@BernHyland

@BernHyland

Some data is expensive to

collect …

@BernHyland

@BernHyland

Data on the Web today

@BernHyland

Lack of Context

credit: http://mhausenblas.info

Required Context

credit: http://mhausenblas.info

Linked data is

intentionally for reuse

Refers to a set of best practices for publishing and interlinking data for access by both humans and machines.

The RDF family of syntaxes (e.g., JSON-LD, N3, Turtle) and HTTP URIs.

Linked Data

@BernHyland

Linked Data can be published by a person or organization behind the firewall or on the

public Web.

Linked Data published on the public Web is generally called Linked Open Data.

- W3C Linked Data Glossary

@BernHyland

Something Something elsea relationship

@BernHyland

UQ Universityis a

@BernHyland

UQ

The University of Queensland

label

Universityis a

Group of 8

affiliation

@BernHyland

UQ

The University of Queensland

label

affiliationGroup of 8

34228

number of undergraduate students

48771

number of students

@BernHyland

credit: http://json-ld.org/

credit: https://callimachusproject.org

# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)

@BernHyland

# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/>select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)

@BernHyland

# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergradswhere { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)

@BernHyland

# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> .?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)

@BernHyland

# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name .OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)

@BernHyland

# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students}OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads}FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)

@BernHyland

# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads} FILTER ( lang(?name) = "en" )} ORDER BY DESC (?students)

@BernHyland

# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)

@BernHyland

@BernHyland

my data

collector

collected by

measurement

Michael

first name

Hausenblaslast name

Person

a

a measurement

2011-01-01date

0

valueunits of measure

degrees Centigrade

...

Galway Airport

collected at

or

Linked Data on the Web

@BernHyland

“Linked Data was part of my initial vision for the Web and is an important part of the Web’s

future. The Web took off as a web of hyperlinked documents which were exciting to read, but which could not be effectively used as data.

“Linked Data was part of my initial vision for the Web and is an important part of the Web’s future. The Web

took off as a web of hyperlinked documents which were exciting to read, but which could not be

effectively used as data.”

- Tim Berners-Lee

“Linked Data was part of my initial vision for the Web and is an important part of the Web’s

future. The Web took off as a web of hyperlinked documents which were exciting to read, but which could not be effectively used as data.

The Semantic Web morphed when it hit the marketplace

Governments & NGOs publishing & consuming Linked Data

07 Nov 2007

10 Nov 2007

28 Feb 2008

31 Mar 2008

18 Sep 2008

05 Mar 2009

27 Mar 2009

14 Jul 2009

22 Sep 2009

22 Sep 2010

• Widens EPA’s audience (justifies relevance), for research, environmental justice

• More cost-effective than relational backed web portals

• Used for scientific R&D, green chemistry, ++ • Increased transparency

https://opendata.epa.gov

@BernHyland

7 Steps to Publish Linked Data

Source: W3C Best Practices for Publishing Linked Data, see https://www.w3.org/TR/ld-bp/

Step #1 - IdentifyIdentify the dataset(s) to be modeled • Request a copy of the logical and physical model of the

database(s)• Obtain data extracts (i.e., databases and/or

spreadsheets) or create data in a way that can be replicated.

@BernHyland

Step #2 - Model Data Model data without context to allow for reuse and easier merging of data sets

• Traditional DBAs organize data for specified Web services or applications

• In Linked Data, application logic does not drive the data schema, concepts, etc

@BernHyland

Step #2 - Modeling (cont)

Look for real world objects of interest (e.g., people, places, things, locations, etc.) and model them.• Investigate how others are already modeling similar or

related data.• Look for duplication & normalize the data• Use common sense to decide whether or not to make

link

@BernHyland

• Connect data from different sources & authoritative vocabularies

• Use URIs as names for your objects• Put aside immediate needs of any application• Don’t think about how an application will use your data• Do think about time and how the data will change over

time.

Step #2 - Modeling (cont)

@BernHyland

Identifiers are at the heart of how things become useful as linked data.

We use the same mechanism for connecting data as the Web — the humble HTTP URI

The Web is formed by HTTP URIs that are essentially connections linking pieces of information together.

Step #3 & 4 Name & Describe

@BernHyland

5. Write a script or process to convert the data set repeatedly

6. Publish to the Web and announce it!

7. Maintenance strategy

Steps #5, 6 & 7 Convert, Publish & Maintain

@BernHyland

Take an iterative approach1. Review of modeling decisions

2. Review vocabularies chosen and developed

3. Modify/update data conversion scripts

4. Do a maintenance walk-through with real use cases

5. Show how to explore data with SPARQL and visualizations

6. Discuss a persistent identifier strategy (think PURLs)@BernHyland

@BernHyland

@BernHyland

Technical DNA of EPA Linked Data Services

• Built on Open Source Software • Provides downloadable Linked Open Data (RDF,

JSON-LD) • Developer guide includes RESTful API, persistent

URLs strategy • Sample apps on GitHub (https://github.com/

USEPA)

@BernHyland

Power of LOD Combining data sets in a day with Linked Open Data from DBpedia & EPA. Next the EPA wanted more chemical data linked to their data…

@BernHyland

Specialist knowledge as Linked Open Data

@BernHyland

PubChem, the world’s largest open molecular

database Used by healthcare / life sciences industry worldwide - all Linked

Open Data

@BernHyland

Use of shared vocabularies, including

SKOS, RDFS, OWL. Other key vocabularies

include Dublin Core, Geo, FOAF, ORG, Vcard are the “lingua franca” of

data interoperability

https://opendata.epa.gov

@BernHyland

Contractor (3 Round Stones, Inc.)

Public

Application, Script or automated client

Web Browser

SPARQL endpointREST APIResource URIs

Linked Data management systemlocated at a Tier 1 Cloud Provider

(FISMA compliant)

RDF Database

Registered developer@BernHyland

• A worldwide system of linked information systems • Global addressing scheme for data integration that scales to the

Web • Nearly immediate data integration to billions of facts

Linked Data is a gift …

@BernHyland

http://LinkedDataDeveloper.com

@BernHyland

How do I get started?

https://www.w3.org/TR/ld-bp/

https://www.w3.org/2012/ldp/charter

Enterprise data interoperability

Use your super powers for good!

@WhoGiveACrapTP

http://w3id.org/people/bernhyland/presentations

Twitter : @BernHyland Email. bhyland@3roundstones.com