07 chris davis

Post on 19-Jan-2015

381 views 3 download

Tags:

description

Presentatie op Appsforenergy 2013, Take the challenge !

transcript

Challenges and Opportunities of Linked Open Energy Data

Chris Davishttp://enipedia.tudelft.nl

c.b.davis@tudelft.nl

Who am I?

● Postdoc Energy & Industry, TBM, TU Delft

● Focus on Industrial Ecology, Open Data, Collaborative Software, Modeling, Visualization, Analytics, etc.

Motivations

● Energy and sustainability are some of the most important topics of the 21st century

● Need both aggregated and fine-grained data

● Research can be data intensive● There's a lot out there, but

connecting it is tedious● Researchers often duplicate effort● It would be great to revolutionize

how we deal with this data

Information wants to be free because it has become so cheap

to distribute, copy, and recombine - too cheap to meter.

Stewart Brand

There's a Tension...

It wants to be expensive because it can be

immeasurably valuable to the recipient.

Stewart Brand

There's a Tension...

That tension will not go away. It leads to endless wrenching debate

about price, copyright, “intellectual property,” and the moral rightness of casual distribution,

because each round of new devices makes the tension worse, not better.

Stewart Brand

There's a Tension...

If you cling blindly to the expensive part

of the paradox, you miss all the action

going on in the free part.

The pressure of the paradox forces information

to explore incessantly.Stewart Brand

There's a Tension...

Pirolli & Card (2005) The Sensemaking Process and Leverage Points for Analyst Technology as Identified Through Cognitive Task Analysis

Pirolli & Card (2005) The Sensemaking Process and Leverage Points for Analyst Technology as Identified Through Cognitive Task Analysis

Data Collectors

Data Scientists, Statisticians, Researchers

Policy/Decision Makers

=-

A Metaphor for Open Data...A Metaphor for Open Data...

It's about Resource Efficiency

● Information is a resource just as much as physical resources

● ...however, it ideally gets better the more that it is used● Data quality is (partly) a function of the amount of

attention it gets● Structure leads to benefits, but requires effort – figure

out what has most value to the community

Inspiration from Pokemon...

http://www.youtube.com/watch?v=XpvQNn0n_Qw

OpenStreetMap (last 90 days)

http://www.itoworld.com/map/129

enipedia.tudelft.nl/maps

Enipedia.tudelft.nl

18

19

20

About data quality...

21

A tale of one (or four?) power stations and seven data sets

22

How the European Commission manages data

Large Combustion Plants Directivehttp://ec.europa.eu/environment/air/pollutants/stationary/lcp/legislation.htm

I wish...

Kraftwerk (Anlagennummer: 0001)

Who is this? (EU ETS Data)

Who is this?

● Name: Kraftwerk (Anlagennummer: 0001)● Account Holder: Felix Schoeller jr. Foto und Spezialpapiere GmbH

& Co KG● address1: Fabrikstraße 1● address2: Felix Schoeller jr. Foto und Spezialpapiere GmbH & Co.

KG● City: Weißenborn/Erzgeb.● CountryCode: DE● InstallationIdentifier: 891● InstallationName: Kraftwerk (Anlagennummer: 0001)● MainActivityTypeCode: 1● MainActivityTypeCodeLookup: Combustion installations with a

rated thermal input exceeding 20 MW● PermitIdentifier: 14310-0300● ZipCode: 09600

Inspiration

http://www.flickr.com/photos/maxbraun/98688824/ http://www.flickr.com/photos/acme/229065626/

Matching Entities

0001 09600 1 909 anlagenkonto anlagennummer co erzgeb fabrikstrasse felix foto gmbh jr kg kraftwerk schoeller spezialpapiere technocell und weissenborn

09600 1 co erzgeb fabrikstrasse felix foto gmbh jr kg schoeller spezialpapiere und weissenborn werk

49086 burg co felix foto gmbh gretesch jr kg osnabruck schoeller spezialpapiere und

0001 09600 1 909 anlagenkonto anlagennummer co erzgeb fabrikstrasse felix foto gmbh jr kg kraftwerk schoeller spezialpapiere technocell und weissenborn

https://en.wikipedia.org/wiki/Claude_Shannonhttp://en.wikipedia.org/wiki/Self-information

30

The current data management practices results in:

Unintentionally Anonymized Open Data

Optimized for Inefficient Maintenance

and an Uphill Battle to Enforce

Principles of Data Integrity

It's power laws all the way down

● Both contributors & data● Challenge is aligning the two

Officially Curated vs. Crowdsourced Data

Officially Curated vs. Crowdsourced Data

34

Officially Curated vs. Crowdsourced Data

● Crowdsourcing generally OK for easily verifiable data● Officially curated data needed for comprehensive, hard

to verify data, small specialized communities● Crowdsourced data is only possible because of revision

control.

How to Measure Data Quality?

DataQuality

ResearcherSkill/Experience

# Viewers/Editors

Ease of IndependentVerification

= X X

Low Editor Diversity

High Editor Diversity

36

How to Measure Data Quality?

● Eric Raymond – “With many eyes all bugs are shallow”● But... not all eyes are evenly distributed

Big Data (?)

38

Questions?

Chris Davishttp://enipedia.tudelft.nl

c.b.davis@tudelft.nl