+ All Categories
Home > Documents > Foundations VI: Provenance

Foundations VI: Provenance

Date post: 02-Feb-2016
Category:
Upload: angelo
View: 55 times
Download: 0 times
Share this document with a friend
Description:
Foundations VI: Provenance. Deborah McGuinness TA Weijing Chen Semantic eScience Week 9, October 31, 2011. References. 2. - PowerPoint PPT Presentation
44
1 Foundations VI: Provenance Deborah McGuinness TA Weijing Chen Semantic eScience Week 9, October 31, 2011
Transcript
Page 1: Foundations VI:   Provenance

1

Foundations VI: Provenance

Deborah McGuinness

TA Weijing Chen

Semantic eScience

Week 9, October 31, 2011

Page 2: Foundations VI:   Provenance

References• PML -McGuinness, Ding, Pinheiro da Silva, Chang. PML 2: A Modular

Explanation Interlingua. AAAI 2007 Workshop on Explanation-aware Computing, Vancouver, Can., 7/07. Stanford Tech report KSL-07-07. http://www.ksl.stanford.edu/KSL_Abstracts/KSL-07-07.html

• Inference Web - McGuinness and Pinheiro da Silva. Explaining Answers from the Semantic Web: The Inference Web Approach. Web Semantics: Science, Services and Agents on the World Wide Web Special issue: International Semantic Web Conference 2003 - Edited by K.Sycara and J.Mylopoulis. Volume 1, Issue 4. Journal published Fall, 2004 http://www.ksl.stanford.edu/KSL_Abstracts/KSL-04-03.html

• McGuinness, D.L.; Zeng, H.; Pinheiro da Silva, P.; Ding, L.; Narayanan, D.; Bhaowal, M. Investigations into Trust for Collaborative Information Repositories: A Wikipedia Case Study. The Workshop on the Models of Trust for the Web (MTW'06), Edinburgh, Scotland, May 22, 2006. 2006. http://www.ksl.stanford.edu/KSL_Abstracts/KSL-06-05.html

• More from http://inference-web.org/wiki/Publications

• Note the LOGD converter generates PML2

Page 3: Foundations VI:   Provenance

3

Semantic Web Methodology and Technology Development Process

• Establish and improve a well-defined methodology vision for Semantic Technology based application development

• Leverage controlled vocabularies, et c.

Use Case

Small Team, mixed skills

Analysis

Adopt Technology Approach

Leverage Technology Infrastructur

e

Rapid PrototypeOpen World:

Evolve, Iterate, Redesign, Redeploy

Use Tools

Science/Expert Review & Iteration

Develop model/

ontology

EvaluationEvaluation

Page 4: Foundations VI:   Provenance

4

Ingest/pipelines: problem definition• Data is coming in faster, in greater volumes and outstripping our ability to perform

adequate quality control

• Data is being used in new ways and we frequently do not have sufficient information on what happened to the data along the processing stages to determine if it is suitable for a use we did not envision

• We often fail to capture, represent and propagate manually generated information that need to go with the data flows

• Each time we develop a new instrument, we develop a new data ingest procedure and collect different metadata and organize it differently. It is then hard to use with previous projects

• The task of event determination and feature classification is onerous and we don't do it until after we get the data

Page 5: Foundations VI:   Provenance

20080602 Fox VSTO et al.

5

Page 6: Foundations VI:   Provenance

6

• Who (person or program) added the comments to the science data file for the best vignetted, rectangular polarization brightness image from January, 26, 2005 1849:09UT taken by the ACOS Mark IV polarimeter?

• What was the cloud cover and atmospheric seeing conditions during the local morning of January 26, 2005 at MLSO?

• Find all good images on March 21, 2008.• Why are the quick look images from March 21,

2008, 1900UT missing?• Why does this image look bad?

Use cases

Page 7: Foundations VI:   Provenance

20080602 Fox VSTO et al.

7

Page 8: Foundations VI:   Provenance

20080602 Fox VSTO et al.

8

Page 9: Foundations VI:   Provenance

9

Provenance• Origin or source from which something

comes, intention for use, who/what generated for, manner of manufacture, history of subsequent owners, sense of place and time of manufacture, production or discovery, documented in detail sufficient to allow reproducibility

• Knowledge provenance; enrich with ontologies and ontology-aware tools

Page 10: Foundations VI:   Provenance

Semantic Technology Foundations

• PML – Proof Markup Language – used for knowledge provenance interlingua

• Inference Web Toolkit – used to manipulate and access knowledge provenance

• OWL-DL ontologies (including SWEET and VSTO ontologies)

• PML -McGuinness, Ding, Pinheiro da Silva, Chang. PML 2: A Modular Explanation Interlingua. AAAI 2007 Workshop on Explanation-aware Computing, Vancouver, Can., 7/07. Stanford Tech report KSL-07-07.

• Inference Web - McGuinness and Pinheiro da Silva. Explaining Answers from the Semantic Web: The Inference Web Approach. Web Semantics: Science, Services and Agents on the World Wide Web Special issue: International Semantic Web Conference 2003 - Edited by K.Sycara and J.Mylopoulis. Volume 1, Issue 4. Journal published Fall, 2004

Page 11: Foundations VI:   Provenance

WWW Toolkit

Proof Markup Language (PML)Learners

JTP/CWM

SPARK

UIMA

IW Explainer/Abstractor

IWBase

IWBrowser

IWSearch

Trust

Justification

Provenance

*

KIF/N3

SPARK-L

Text Analytics

IWTrust

provenanceregistration

search enginebased publishing

Expert friendlyVisualization

End-user friendly visualization

Trust computationOWL-S/BPELSDS

Trace of web service discovery

Learning Conclusions

Trace of task execution

Trace of information extraction

Theorem prover/Rules

Inference Web Explanation Architecture

• Semantic Web based infrastructure• PML is an explanation interlingua

– Represent knowledge provenance (who, where, when…)– Represent justifications and workflow traces across system boundaries

• Inference Web provides a toolkit for data management and visualization

Page 12: Foundations VI:   Provenance

Global View and More

• Explanation as a graph• Customizable browser options

– Proof style– Sentence format– Lens magnitude– Lens width

• More information– Provenance metadata– Source PML– Proof statistics– Variable bindings– Link to tabulator– …

Views of Explanation

Explanation (in PML)

filtered focused global

abstraction

discourse

provenancetrust

Page 13: Foundations VI:   Provenance

Provenance View• Source metadata: name, description, …• Source-Usage metadata: which fragment of

a source has been used when

Views of Explanation

Explanation (in PML)

filtered focused global

abstraction

discourse

provenancetrust

Page 14: Foundations VI:   Provenance

Trust Tab

Fragment colored by trust value

Detailed trust explanation

Trust View

• (preliminary) simple trust representation

• Provides colored (mouseable) view based on trust values

• Enables sharing and collaborative computation and propagation of trust values

Views of Explanation

Explanation (in PML)

filtered focused global

abstraction

discourse

provenancetrust

Page 15: Foundations VI:   Provenance

Discourse View • (Limited) natural language interface• Mixed initiative dialogue• Exemplified in CALO domain• Explains task execution component powered by

learned and human generated procedures

Views of Explanation

Explanation (in PML)

filtered focused global

abstraction

discourse

provenancetrust

Page 16: Foundations VI:   Provenance

Selected IW and PML Applications• Portable proofs across reasoners: JTP (with temporal and

context reasoners (Stanford); CWM (W3C), SNARK(SRI), …

• Explaining web service composition and discovery (SNRC)• Explaining information extraction (more emphasis on

provenance – KANI, UIMA)• Explaining intelligence analysts’ tools (NIMD/KANI)• Explaining tasks processing (SPARK / CALO)• Explaining learned procedures (TAILOR, LAPDOG, /

CALO)• Explaining privacy policy law validation (TAMI)• Explaining decision making and machine learning (GILA)• Explaining trust in social collaborative networks (TrustTab)• Registered knowledge provenance: IW Registrar

(Explainable Knowledge Aggregation)• Explaining natural science provenance – VSTO, SPCDIS,

Page 17: Foundations VI:   Provenance

PML1 vs. PML2• PML1 was introduced in 2002

– It has been used in multiple contexts ranging from explaining theorem provers to text analytics to machine learning.

– It was specified as a single ontology

• PML2 improves PML1 by – Adopting a modular design: splitting the original ontology

into three pieces: provenance, justification, and trust• This improves reusability, particularly for applications

that only need certain explanation aspects, such as provenance or trust.

– Enhancing explanation vocabulary and structure• Adding new concepts, e.g. information• Refining explanation structure

Page 18: Foundations VI:   Provenance

PML Provenance Ontology• Scope: annotating

provenance metadata• Highlights

– Information– Source Hierarchy– Source Usage

Page 19: Foundations VI:   Provenance

Referencing, Encoding and Annotating a Piece of Information

• Referencing a piece of information – using URI

• Encoding the content of information– Complete Quote:

<hasRawString>(type TonysSpecialty SHELLFISH) </hasRawString>– Obtained from URL:

<hasURL>http://inference-web.org/ksl/registry/storage/documents/tonys_fact.kif</hasURL>

• Annotations – For human consumption:

<hasPrettyString>Tonys’ Specialty is ShellFish</hasPrettyString>

– For machine consumption• Language:

<hasLanguage rdf:resource="http://inference-web.org/registry/LG/KIF.owl#KIF" /> • Format:

<hasFormat "http://inference-web.org//registry/FM/PDF.owl#PDF" />

Page 20: Foundations VI:   Provenance

Source Hierarchy• Source is the container of information• Our source hierarchy offers

– Many well-known sources such as• Sensor (e.g. geo-science)• InferenceEngine (e.g. reasoner)• WebService (e.g. workflow)

– Finer granularity of source than just document• DocumentFragment (for text analytics)

Page 21: Foundations VI:   Provenance

Source Usage• Source Usage

– logs the action that accesses a source at a certain dateTime to retrieve information

– is part of PML1

• Example: Source #ST was accessed on certain date

<pmlp:SourceUsage rdf:about="#usage1"> <pmlp:hasUsageDateTime>2005-10-17T10:30:00Z</pmlp:hasUsageDateTime> <pmlp:hasSource rdf:resource="#ST"/></pmlp:SourceUsage>

Page 22: Foundations VI:   Provenance

PML Justification Ontology• Scope: annotating

justification process• Highlights

– Template for question-answer/justification

– Four types of justification

Page 23: Foundations VI:   Provenance

Four Types of JustificationGoal conclusion without justification

Assumption conclusion assumed (using Assumption Rule) asserted by an InferenceEngine, no antecedent

Direct Assertion conclusion directly asserted (using DirectAssertion rule) by an InferenceEngine, no antecedent

Regular conclusion derived from antecedent conclusions

Page 24: Foundations VI:   Provenance

PML Trust Ontology• Scope: annotate trust and

belief assertions• Highlights

– Extensible trust representation (user may plug in their quantitative metrics using OWL class inheritance feature)

– Has been used to provide a trust tab filter for wikipedia – see McGuinness, Zeng, Pinheiro da Silva, Ding, Narayanan, and Bhaowal. Investigations into Trust for Collaborative Information Repositories: A Wikipedia Case Study. WWW2006 Workshop on the Models of Trust for the Web (MTW'06), Edinburgh, Scotland, May 22, 2006.

Page 25: Foundations VI:   Provenance

25

Page 26: Foundations VI:   Provenance

20080602 Fox VSTO et al.

26

Page 27: Foundations VI:   Provenance

20080602 Fox VSTO et al.

27

Quick look browse

Page 28: Foundations VI:   Provenance

28

Page 29: Foundations VI:   Provenance

29

Visual browse

Page 30: Foundations VI:   Provenance

30

Page 31: Foundations VI:   Provenance

31

Page 32: Foundations VI:   Provenance

Search and structured query

32

Search StructuredQuery

Page 33: Foundations VI:   Provenance

20080602 Fox VSTO et al.

33

Search

Page 34: Foundations VI:   Provenance

Provenance within SemantAqua

34

Page 35: Foundations VI:   Provenance

SemantAqua System Architecture

access

Virtuoso

35

Page 36: Foundations VI:   Provenance

Provenance• Preserves provenance in the Proof Markup

Language (PML).

• Data Source Level Provenance:– The captured provenance data are used to support

provenance-based queries.

• Reasoning level provenance: – When water source been marked as polluted, user can

access supporting provenance data for the explanations including the URLs of the source data, intermediate data, the converted data, and regulatory data.

36

Page 37: Foundations VI:   Provenance

Visualization

37

http://tinyurl.com/iswc-swqp

Page 38: Foundations VI:   Provenance

Visualization

38

http://tinyurl.com/iswc-swqp

Page 39: Foundations VI:   Provenance

Visualization

39

http://tinyurl.com/iswc-swqp

Page 40: Foundations VI:   Provenance

Visualization

40

http://tinyurl.com/iswc-swqp

Page 41: Foundations VI:   Provenance

Visualization

41

http://tinyurl.com/iswc-swqp

Page 42: Foundations VI:   Provenance

Visualization

• Time series Visualization:– Presents data in time series visualization

for user to explore and analyze the data

Limit value: 400

Violation, measured value: 2032.8

42

Violation, measured value: 971

http://was.tw.rpi.edu/swqp/trend/epaTrend.html?state=RI&county=1&site=http%3A%2F%2Ftw2.tw.rpi.edu%2Fzhengj3%2Fowl%2Fepa.owl%23facility-110009444869

Page 43: Foundations VI:   Provenance

Selected Results• Provenance information encoded using semantic

web technology supports transparency and trust. • SemantAqua provides detailed provenance information:

– Original data, intermediate data, data source

• “What if” Scenario: – User can apply a stricter regulation from another state to a local

water source.

• User may be interested only in certain sources and can use the interface to control queries

43

Page 44: Foundations VI:   Provenance

Aim at providing at least as much provenance as

SemantAqua• Questions?

44


Recommended