Linked Data Approach for Integration of Human Health & Environmental Data

Post on 07-May-2015

614 views 0 download

description

Best practices and platforms for access and reuse of scientific data and models. We explore a Linked Data approach for data integration, modeling and interoperability. Delivered by Bernadette Hyland at EPA & Society of Toxicology Scientific Workshop titled: "Building for Better Decisions: Multi-scale Integration of Human Health and Environmental Data.. Delivered 8-May-2012 at EPA Research Triangle Park, NC USA.

transcript

Linked Data Approach for Integration of Human Health and

Environmental DataBuilding for Better Decisions: Multi-scale Integration of Human Health

and Environmental Data 8-11 May 2012

By: Bernadette Hyland, Chair, W3C Government Linked Data WG

CEO, 3 Round Stones, Inc

Email. bhyland@3roundstones.comTwitter: @BernHyland

This presentation: http://slideshare.net/3roundstones

1Tuesday, May 8, 12

• Linked Data is about publishing and consuming data using international data standards

• Based on 20 year old idea

• A system of linked information systems

2Tuesday, May 8, 12

3Tuesday, May 8, 12

Photo credit: http://www.flickr.com/photos/sjungling/5974860/

4Tuesday, May 8, 12

1970s 1980s 1990s

$ cat foo.txt | grep blah | sort

A neat little package Client-Server The Early Web

A HISTORY OF SILOS

5Tuesday, May 8, 12

There is a better way to connect data ...

• No one vendor owns it• It scales ... to Web-scale• Doesn’t require a super model• Based on International Data Exchange Standards (RDF, SPARQL)

6Tuesday, May 8, 12

• What is next for Open Data on the Web

• Structured data on the Web is quickly becoming mainstream

• Authorities beginning to appreciate a new way to publish and consume content

What is next for Data in the Web?

7Tuesday, May 8, 12

8Tuesday, May 8, 12

9Tuesday, May 8, 12

“Linked Data means

Cooperation without coordination”

-- David Wood, PhD

10Tuesday, May 8, 12

GovernmentsGoals: Governmental transparency and/or improved

internal efficiencies (data warehouses)

11Tuesday, May 8, 12

Hardware/Software Vendors

Goal: Improve interoperability between products and product lines

12Tuesday, May 8, 12

RetailersGoal: Improve click-throughs on search results

13Tuesday, May 8, 12

Book PublishersGoals: Improve internal manuscript pipelines, expose

additional ways of finding and using content

14Tuesday, May 8, 12

New Media

15Tuesday, May 8, 12

Web

Universal Client

Universal Connection

Universal Database

Logic and interlinking

Ubiquitous,reusable applications

URL Curation

of Data

Linked Data in Context

16Tuesday, May 8, 12

17Tuesday, May 8, 12

18Tuesday, May 8, 12

19Tuesday, May 8, 12

Why is RDF important?• It is an international standard for publishing data on

the Web (public and private)

• Data exchange model

• Serializations include RDF/XML, N-triples, N3, Turtle ...

• It is the future of using the Web

20Tuesday, May 8, 12

What you can do ...

• Good = Use Data Standards (RDF) to publish metadata about data and models, at a minimum

• Better = Use RDF to publish all your data

• Best = Link your data + models

• Web architecture, Web-scale

21Tuesday, May 8, 12

22Tuesday, May 8, 12

23Tuesday, May 8, 12

WE’VE SEEN THIS BEFORE

24Tuesday, May 8, 12

25Tuesday, May 8, 12

26Tuesday, May 8, 12

27Tuesday, May 8, 12

28Tuesday, May 8, 12

29Tuesday, May 8, 12

30Tuesday, May 8, 12

31Tuesday, May 8, 12

32Tuesday, May 8, 12

33Tuesday, May 8, 12

EMRData

InternalPortal  Data

Linked  DataCloud

Open  Government  Data

Social  Media

Clinical  Condi*on  Specific

PhysiciansServicesLoca*ons

DBpediaPub  MedNLM

CDCEPA

US  Census

FacebookTwiCer

ClinicalOntology

BusinessOntology

34Tuesday, May 8, 12

•Decrease costly emergency department visits

•Reduce hospital re-admissions after treatment

• Improved self-care and medication compliance

•Education of triggers and disease management

Value Proposition

35Tuesday, May 8, 12

Func*onal  Model

1.  Define  target  popula*on  and  clinical  data  from  electronic  medical  record

2.  Iden*fy  sources  of  open  government  data  related  to  environmental,  weather,  and  other  variables  related  to  chronic  pulmonary  disease  exacerba*ons

3.  Combine  open  content  from  NLM,  PubMed,  Medline  to  support  educa*on

4.  Leverage  a  Linked  Data  approach,  using  Open  Source  and  interna*onal  data  exchange  standards  (RDF)

5.    Alert  pa*ent  of  possible  hazardous  condi*ons  and  recommend  appropriate  ac*ons

36Tuesday, May 8, 12

CA-­‐email-­‐message.jpg

Leverage  Linked  Data,  Open  Source  &  Standards

CDCEPA

US  Census

DBpediaPub  MedNLM

Web  of  Data

EMR

SMS

Email

Web

37Tuesday, May 8, 12

38Tuesday, May 8, 12

Shows:

1) Air Quality data from US EPA

2) Anonymized EMR data

3) Doctor’s details from CSV file

Uses Callimachus,a Linked Data Management Platform

39Tuesday, May 8, 12

• Large and small vendors are involved in Linked Data

• From Oracle, IBM to 3 Round Stones

• Listing of active projects, companies and research See http://dir.w3.org/

• Best practices, see http://www.w3.org/2011/gld/charter

Tools & best practices?

40Tuesday, May 8, 12

•Callimachus is a framework for data-driven applications based on Linked Data principles

•Callimachus allows Web developers to easily create data driven applications for the Web

• It is Open Source (FLOSS)

•http://callimachusproject.org

41Tuesday, May 8, 12

http://www.w3.org/2011/gld/charter42Tuesday, May 8, 12

DELIVERABLES

Community Directory

Best Practices for Publishing Linked Data

Procurement, vocabulary selection, URI construction, versioning, stability, legacy data issues

Cookbook for Linked Open Data

Standard Vocabularies

Metadata, Statistical “Cube” Data, People, Organizational structures

43Tuesday, May 8, 12

44Tuesday, May 8, 12

• Be prepared for the scientific community & public to demand that your data be

published in re-usable format (RDF)

• Demand your vendors use Open Source whenever possible

• Incentivize industry & STM publishers to do the right thing

• Open vs. proprietary technologies & data formats ... be OPEN• Beware of semantic “pixie dust” - be “an educated consumer” (and scientist!)

• Solutions must embrace International Standards and published Best Practices (W3C, OMG, IETF)

• Define a URI Policy and Strategy, document it and ensure scientists use it!

• Leverage the work of others and work cooperatively...

• Our future is all connected through your work...

Recommendations

45Tuesday, May 8, 12

This work is Copyright © 2011-2012 3 Round Stones Inc.It is licensed under the Creative Commons Attribution 3.0 Unported LicenseFull details at: http://creativecommons.org/licenses/by/3.0/

You are free:

to Share — to copy, distribute and transmit the work

to Remix — to adapt the work

Under the following conditions:Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).

Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one.

46Tuesday, May 8, 12