Date post: | 12-Apr-2017 |
Category: |
Data & Analytics |
Upload: | bernadette-hyland |
View: | 59 times |
Download: | 0 times |
Extend Your Reach.
Linking Open Government Data at Scale
YOW! 2016 Conference Melbourne December 1-2 ~ Brisbane December 5-6
Sydney December 8-9
Bernadette HylandCEO & co-founder3 Round Stones, Inc.@[email protected]
@BernHyland
@BernHyland
Some data is expensive to
collect …
@BernHyland
@BernHyland
Data on the Web today
@BernHyland
Linked data is
intentionally for reuse
Refers to a set of best practices for publishing and interlinking data for access by both humans and machines.
The RDF family of syntaxes (e.g., JSON-LD, N3, Turtle) and HTTP URIs.
Linked Data
@BernHyland
Linked Data can be published by a person or organization behind the firewall or on the
public Web.
Linked Data published on the public Web is generally called Linked Open Data.
- W3C Linked Data Glossary
@BernHyland
Something Something elsea relationship
@BernHyland
UQ Universityis a
@BernHyland
UQ
The University of Queensland
label
Universityis a
Group of 8
affiliation
@BernHyland
UQ
The University of Queensland
label
affiliationGroup of 8
34228
number of undergraduate students
48771
number of students
@BernHyland
credit: http://json-ld.org/
# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)
@BernHyland
# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/>select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)
@BernHyland
# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergradswhere { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)
@BernHyland
# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> .?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)
@BernHyland
# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name .OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)
@BernHyland
# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students}OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads}FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)
@BernHyland
# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads} FILTER ( lang(?name) = "en" )} ORDER BY DESC (?students)
@BernHyland
# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)
@BernHyland
@BernHyland
my data
collector
collected by
measurement
Michael
first name
Hausenblaslast name
Person
a
a measurement
2011-01-01date
0
valueunits of measure
degrees Centigrade
...
Galway Airport
collected at
or
Linked Data on the Web
@BernHyland
“Linked Data was part of my initial vision for the Web and is an important part of the Web’s
future. The Web took off as a web of hyperlinked documents which were exciting to read, but which could not be effectively used as data.
“Linked Data was part of my initial vision for the Web and is an important part of the Web’s future. The Web
took off as a web of hyperlinked documents which were exciting to read, but which could not be
effectively used as data.”
- Tim Berners-Lee
“Linked Data was part of my initial vision for the Web and is an important part of the Web’s
future. The Web took off as a web of hyperlinked documents which were exciting to read, but which could not be effectively used as data.
The Semantic Web morphed when it hit the marketplace
Governments & NGOs publishing & consuming Linked Data
07 Nov 2007
10 Nov 2007
28 Feb 2008
31 Mar 2008
18 Sep 2008
05 Mar 2009
27 Mar 2009
14 Jul 2009
22 Sep 2009
22 Sep 2010
• Widens EPA’s audience (justifies relevance), for research, environmental justice
• More cost-effective than relational backed web portals
• Used for scientific R&D, green chemistry, ++ • Increased transparency
https://opendata.epa.gov
@BernHyland
7 Steps to Publish Linked Data
Source: W3C Best Practices for Publishing Linked Data, see https://www.w3.org/TR/ld-bp/
Step #1 - IdentifyIdentify the dataset(s) to be modeled • Request a copy of the logical and physical model of the
database(s)• Obtain data extracts (i.e., databases and/or
spreadsheets) or create data in a way that can be replicated.
@BernHyland
Step #2 - Model Data Model data without context to allow for reuse and easier merging of data sets
• Traditional DBAs organize data for specified Web services or applications
• In Linked Data, application logic does not drive the data schema, concepts, etc
@BernHyland
Step #2 - Modeling (cont)
Look for real world objects of interest (e.g., people, places, things, locations, etc.) and model them.• Investigate how others are already modeling similar or
related data.• Look for duplication & normalize the data• Use common sense to decide whether or not to make
link
@BernHyland
• Connect data from different sources & authoritative vocabularies
• Use URIs as names for your objects• Put aside immediate needs of any application• Don’t think about how an application will use your data• Do think about time and how the data will change over
time.
Step #2 - Modeling (cont)
@BernHyland
Identifiers are at the heart of how things become useful as linked data.
We use the same mechanism for connecting data as the Web — the humble HTTP URI
The Web is formed by HTTP URIs that are essentially connections linking pieces of information together.
Step #3 & 4 Name & Describe
@BernHyland
5. Write a script or process to convert the data set repeatedly
6. Publish to the Web and announce it!
7. Maintenance strategy
Steps #5, 6 & 7 Convert, Publish & Maintain
@BernHyland
Take an iterative approach1. Review of modeling decisions
2. Review vocabularies chosen and developed
3. Modify/update data conversion scripts
4. Do a maintenance walk-through with real use cases
5. Show how to explore data with SPARQL and visualizations
6. Discuss a persistent identifier strategy (think PURLs)@BernHyland
@BernHyland
@BernHyland
Technical DNA of EPA Linked Data Services
• Built on Open Source Software • Provides downloadable Linked Open Data (RDF,
JSON-LD) • Developer guide includes RESTful API, persistent
URLs strategy • Sample apps on GitHub (https://github.com/
USEPA)
@BernHyland
Power of LOD Combining data sets in a day with Linked Open Data from DBpedia & EPA. Next the EPA wanted more chemical data linked to their data…
@BernHyland
Specialist knowledge as Linked Open Data
@BernHyland
PubChem, the world’s largest open molecular
database Used by healthcare / life sciences industry worldwide - all Linked
Open Data
@BernHyland
Use of shared vocabularies, including
SKOS, RDFS, OWL. Other key vocabularies
include Dublin Core, Geo, FOAF, ORG, Vcard are the “lingua franca” of
data interoperability
Contractor (3 Round Stones, Inc.)
Public
Application, Script or automated client
Web Browser
SPARQL endpointREST APIResource URIs
Linked Data management systemlocated at a Tier 1 Cloud Provider
(FISMA compliant)
RDF Database
Registered developer@BernHyland
• A worldwide system of linked information systems • Global addressing scheme for data integration that scales to the
Web • Nearly immediate data integration to billions of facts
Linked Data is a gift …
@BernHyland
http://LinkedDataDeveloper.com
@BernHyland
http://www.oreilly.com/data/free/files/the-global-impact-of-open-
data.pdf
@BernHyland
How do I get started?
https://www.w3.org/TR/ld-bp/
https://www.w3.org/2012/ldp/charter
Enterprise data interoperability
Use your super powers for good!
@WhoGiveACrapTP