+ All Categories
Home > Documents > Workflows, experimental findings, and their provenance ... interpro match report interpro scan....

Workflows, experimental findings, and their provenance ... interpro match report interpro scan....

Date post: 15-Oct-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
41
Workflows, experimental findings, and their provenance: towards semantically rich linked data and method sharing for collaborative science Paolo Missier [email protected] Newcastle University, UK Credible workshop Sophia-Antipolis October, 2012
Transcript
Page 1: Workflows, experimental findings, and their provenance ... interpro match report interpro scan. Janus -- IPAW, Troy, NY, June 15-17, 2010 From structure annotations to values annotations

Workflows, experimental findings, and their provenance: towards semantically rich linked data and method sharing

for collaborative science

Paolo [email protected] University, UK

Credible workshopSophia-AntipolisOctober, 2012

Page 3: Workflows, experimental findings, and their provenance ... interpro match report interpro scan. Janus -- IPAW, Troy, NY, June 15-17, 2010 From structure annotations to values annotations

IDC

C ’1

1 - P

.Mis

sier

et a

l.“Virtual experimental science” (DCC’09)

3

[LMB+10]

Page 4: Workflows, experimental findings, and their provenance ... interpro match report interpro scan. Janus -- IPAW, Troy, NY, June 15-17, 2010 From structure annotations to values annotations

Outline1. Workflow and their provenance

– Janus: workflows + provenance + semantics for

• (+ LOD) [MLB+12, ZSM+11]– The Semantic data model for provenance (ongoing)

– PROV-W: unofficial workflow extension (and semantic annotations)

2. Packaging and sharing: data + methods + provenance

– Research Objects in the project

– and Data Packages

4

PROV

Page 5: Workflows, experimental findings, and their provenance ... interpro match report interpro scan. Janus -- IPAW, Troy, NY, June 15-17, 2010 From structure annotations to values annotations

5

1: Workflows and provenance

- The Janus provenance ontology for Taverna

- The W3C PROV ontology

- PROV-W

Page 6: Workflows, experimental findings, and their provenance ... interpro match report interpro scan. Janus -- IPAW, Troy, NY, June 15-17, 2010 From structure annotations to values annotations

Janus -- IPAW, Troy, NY, June 15-17, 2010

Example workflow (Taverna)

QTL → Ensembl Genes

Ensembl Gene →Uniprot Gene

merge gene IDs

Ensembl Gene →Entrez Gene

Uniprot Gene →Kegg Gene

Entrez Gene →Kegg Gene

Gene → Pathway path:mmu04210 Apoptosis,path:mmu04010 MAPK, ...

chr: 17start: 28500000end: 3000000

Page 7: Workflows, experimental findings, and their provenance ... interpro match report interpro scan. Janus -- IPAW, Troy, NY, June 15-17, 2010 From structure annotations to values annotations

path:mmu04010→derivedFrom→mmu:26416 path:mmu04012→derivedFrom→mmu:12575

Janus -- IPAW, Troy, NY, June 15-17, 2010

Baseline provenance of a workflow run

7

• The graph encodes all direct data dependency relations

QTL → Ensembl Genes

Ensembl Gene →Uniprot Gene

merge gene IDs

Ensembl Gene →Entrez Gene

Uniprot Gene →Kegg Gene

Entrez Gene →Kegg Gene

Gene → Pathway

exec

y11

a1 b1

ymn

bman

wv1 vn

... ...

...

...

path:mmu04010

mmu:26416

path:mmu04012

mmu:12575

• Baseline query model: compute paths amongst sets of nodes• Transitive closure over data dependency relations

Page 8: Workflows, experimental findings, and their provenance ... interpro match report interpro scan. Janus -- IPAW, Troy, NY, June 15-17, 2010 From structure annotations to values annotations

Janus -- IPAW, Troy, NY, June 15-17, 2010

Janus: a semantic provenance model

8

Page 9: Workflows, experimental findings, and their provenance ... interpro match report interpro scan. Janus -- IPAW, Troy, NY, June 15-17, 2010 From structure annotations to values annotations

Janus -- IPAW, Troy, NY, June 15-17, 2010

Janus: a semantic provenance model

8

X1 X2

Y1

P

X3

Y2

v1 v2 v3

w1 w2

X1 X2

Y1

P_inst

X3

Y2

processorspec

exec

processorexec

port

portvalue

Page 10: Workflows, experimental findings, and their provenance ... interpro match report interpro scan. Janus -- IPAW, Troy, NY, June 15-17, 2010 From structure annotations to values annotations

Janus -- IPAW, Troy, NY, June 15-17, 2010

Annotated provenance

9

exec

QTL → Ensembl Genes

Ensembl Gene →Uniprot Gene

merge gene IDs

Ensembl Gene →Entrez Gene

Uniprot Gene →Kegg Gene

Entrez Gene →Kegg Gene

Gene → Pathway

...

Annotated workflow Annotated provenance graph

Kegg

Gene

Find all genes within the input QTL region that are involved in a given KEGG pathway.

List relevant PubMed publications for the pathways listed in the result set.

Page 11: Workflows, experimental findings, and their provenance ... interpro match report interpro scan. Janus -- IPAW, Troy, NY, June 15-17, 2010 From structure annotations to values annotations

Janus -- IPAW, Troy, NY, June 15-17, 2010

From structure annotations to values annotations

10

X1 X2

Y1

P

X3

Y2

processorspec

proteinsequence

interpromatchreport

interproscan

Page 12: Workflows, experimental findings, and their provenance ... interpro match report interpro scan. Janus -- IPAW, Troy, NY, June 15-17, 2010 From structure annotations to values annotations

Janus -- IPAW, Troy, NY, June 15-17, 2010

From structure annotations to values annotations

10

X1 X2

Y1

P

X3

Y2

processorspec

proteinsequence

interpromatchreport

interproscan

v1 v2 v3

w1 w2

X1 X2

Y1

P_inst

X3

Y2processor

exec

port

portvalue

interproscan

proteinsequence

interpromatchreport

exec

Page 13: Workflows, experimental findings, and their provenance ... interpro match report interpro scan. Janus -- IPAW, Troy, NY, June 15-17, 2010 From structure annotations to values annotations

X rdf:type Port C = {c} X has value type cX has value v v rdf:type PortValue

v rdf:type C

Janus -- IPAW, Troy, NY, June 15-17, 2010

Annotations propagation rules

11

X1 X2

Y1

P

X3

Y2

processorspec

proteinsequence

interpromatchreport

interproscan

proteinsequencehas_value_type

denotes data type in the PL sense

Page 14: Workflows, experimental findings, and their provenance ... interpro match report interpro scan. Janus -- IPAW, Troy, NY, June 15-17, 2010 From structure annotations to values annotations

X rdf:type Port C = {c} X has value type cX has value v v rdf:type PortValue

v rdf:type C

Janus -- IPAW, Troy, NY, June 15-17, 2010

Annotations propagation rules

11

v1 v2 v3

w1 w2

X1 X2

Y1

P_inst

X3

Y2processor

exec

port

portvalue

interproscan interpro

matchreport

X1 X2

Y1

P

X3

Y2

processorspec

proteinsequence

interpromatchreport

interproscan

?

proteinsequencehas_value_type

denotes data type in the PL sense

Page 15: Workflows, experimental findings, and their provenance ... interpro match report interpro scan. Janus -- IPAW, Troy, NY, June 15-17, 2010 From structure annotations to values annotations

X rdf:type Port C = {c} X has value type cX has value v v rdf:type PortValue

v rdf:type C

Janus -- IPAW, Troy, NY, June 15-17, 2010

Annotations propagation rules

11

v1 v2 v3

w1 w2

X1 X2

Y1

P_inst

X3

Y2processor

exec

port

portvalue

interproscan interpro

matchreport

X1 X2

Y1

P

X3

Y2

processorspec

proteinsequence

interpromatchreport

interproscan

proteinsequencehas_value_type

denotes data type in the PL sense

Page 16: Workflows, experimental findings, and their provenance ... interpro match report interpro scan. Janus -- IPAW, Troy, NY, June 15-17, 2010 From structure annotations to values annotations

Janus -- IPAW, Troy, NY, June 15-17, 2010

Annotations as semantic overlay

12

v1

vn

w1

wm

has_port_value has_port_value

v1

vn

w1

wm

Gene Pathway

Kegg Kegg

instance-of

instance-of

has-source has-source

instance-of

has-source

instance-of

has-source

Pathwaysearchservice

has-input-type has-output-type

instance-of

has_port_value has_port_value

Provenance graphfragment X

1X

2

Y1

P

X3

Y2

X1

X2

Y1

P

X3

Y2

Page 17: Workflows, experimental findings, and their provenance ... interpro match report interpro scan. Janus -- IPAW, Troy, NY, June 15-17, 2010 From structure annotations to values annotations

Janus -- IPAW, Troy, NY, June 15-17, 2010

Extensions to Linked Data

13

exec

QTL → Ensembl Genes

Ensembl Gene →Uniprot Gene

merge gene IDs

Ensembl Gene →Entrez Gene

Uniprot Gene →Kegg Gene

Entrez Gene →Kegg Gene

Gene → Pathway

...

Annotated workflow Annotated provenance graph

Page 18: Workflows, experimental findings, and their provenance ... interpro match report interpro scan. Janus -- IPAW, Troy, NY, June 15-17, 2010 From structure annotations to values annotations

Janus -- IPAW, Troy, NY, June 15-17, 2010

Extensions to Linked Data

13

exec

QTL → Ensembl Genes

Ensembl Gene →Uniprot Gene

merge gene IDs

Ensembl Gene →Entrez Gene

Uniprot Gene →Kegg Gene

Entrez Gene →Kegg Gene

Gene → Pathway

...

Annotated workflow Annotated provenance graph

- Publish- I - Map IDs- II - query

Page 19: Workflows, experimental findings, and their provenance ... interpro match report interpro scan. Janus -- IPAW, Troy, NY, June 15-17, 2010 From structure annotations to values annotations

Janus -- IPAW, Troy, NY, June 15-17, 2010

I - Mapping data values to LoD URIs

14

In our prototype we map data values to Bio2RDF as follows:

Entrez Genes

Uniprot Genes

KEGG Genes

KEGG Pathways

Linked Data Query example:

List relevant PubMed publications for the pathways listed in the workflow result set

Page 20: Workflows, experimental findings, and their provenance ... interpro match report interpro scan. Janus -- IPAW, Troy, NY, June 15-17, 2010 From structure annotations to values annotations

The PROV ontology from the W3C

15

• Plus a growing catalogue of examples from group members:http://www.w3.org/2011/prov/wiki/PROV_examples

Page 21: Workflows, experimental findings, and their provenance ... interpro match report interpro scan. Janus -- IPAW, Troy, NY, June 15-17, 2010 From structure annotations to values annotations

PROV Ontology (PROV-O) • “PROV-O defines the normative OWL2 Web Ontology Language encoding

of the PROV Data Model” [1]

16

[1] Current version: http://www.w3.org/TR/prov-o/

http://dvcs.w3.org/hg/prov/raw-file/default/ontology/ProvenanceOntology.owl

state as of Oct., 2012: Last Call -- now closed to public comments

Core elements:

Page 22: Workflows, experimental findings, and their provenance ... interpro match report interpro scan. Janus -- IPAW, Troy, NY, June 15-17, 2010 From structure annotations to values annotations

PROV-O encoding: simple example• Examples drawn from the PROV Primer document

17 [2] Current editor’s draft: http://dvcs.w3.org/hg/prov/raw-file/default/primer/Primer.html

ex:compose a prov:Activity; prov:used ex:dataSet1 ; prov:used ex:regionList .ex:composition a prov:Entity; prov:wasGeneratedBy ex:compose .ex:illustrate a prov:Activity; prov:used ex:composition .ex:chart1 a prov:Entity; prov:wasGeneratedBy ex:illustrate .

PROV-N:used(ex:compose, ex:dataSet1, -)used(ex:compose, ex:regionList, -)wasGeneratedBy(ex:composition, ex:compose, -)used(ex:illustrate, ex:composition, -)wasGeneratedBy(ex:chart1, ex:illustrate, -)PROV-O (Turtle):

Page 23: Workflows, experimental findings, and their provenance ... interpro match report interpro scan. Janus -- IPAW, Troy, NY, June 15-17, 2010 From structure annotations to values annotations

Plans in PROV-O• PROV deliberately does not deal with program structure

– workflow, processor, port

18

ex:correct a prov:Activity; prov:used ex:dataSet1.ex:edith a prov:Agent, prov:Person .ex:instructions a prov:Plan .

ex:correct prov:qualifiedAssociation [ a Association ; prov:agent ex:edith ; prov:hadPlan ex:instructions ].ex:dataSet2 prov:wasGeneratedBy ex:correct .ex:dataSet2 prov:wasRevisionOf ex:dataSet1 .

Page 24: Workflows, experimental findings, and their provenance ... interpro match report interpro scan. Janus -- IPAW, Troy, NY, June 15-17, 2010 From structure annotations to values annotations

One possible rendering of the Janus example

19

v1

vn

w1

wm

has_port_value has_port_value

X1

X2

Y1

P

X3

Y2

P

Pact

wasAssociatedWith

prov:type = prov:plan

P-exec

V1 usedwf:port = "X1" W1wasGenBy

wf:port = "Y1"

hadPlan

Page 25: Workflows, experimental findings, and their provenance ... interpro match report interpro scan. Janus -- IPAW, Troy, NY, June 15-17, 2010 From structure annotations to values annotations

PROV-W: Workflow pattern with annotations

20

:P a prov:Plan, prov:Entity; a PathwaySearchService.:P_exec a prov:Agent.:P_act a prov:Activity; prov:used :V1; prov:qualifiedAssociation [ a prov:Association; prov:agent :P_exec; prov:hadPlan :P; ]; prov:qualifiedUsage [ a prov:Usage ; prov:Entity :V1; wf:port “X1” ].:V1 a prov:Entity; a :Gene; :hasSource :Kegg.:W1 a prov:Entity; prov:qualifiedGeneration [ a prov:Generation; prov:activity :P_act; wf:port “Y1” ]; a :Pathway; :hasSource :Kegg.

P

Pact

wasAssociatedWithprov:type = prov:plan

P-exec

V1 usedwf:port = "X1" W1wasGenBy

wf:port = "Y1"

v1

vn

Gene

Kegg

instance-of

instance-of

has-source

has-source

Pathwaysearchservice

has-input-type

instance-of

has_port_value

X1

X2

Y1

P

X3

Y2

Page 26: Workflows, experimental findings, and their provenance ... interpro match report interpro scan. Janus -- IPAW, Troy, NY, June 15-17, 2010 From structure annotations to values annotations

The complete suite of PROV specifications

21

Page 27: Workflows, experimental findings, and their provenance ... interpro match report interpro scan. Janus -- IPAW, Troy, NY, June 15-17, 2010 From structure annotations to values annotations

The complete suite of PROV specifications

21

Page 28: Workflows, experimental findings, and their provenance ... interpro match report interpro scan. Janus -- IPAW, Troy, NY, June 15-17, 2010 From structure annotations to values annotations

The complete suite of PROV specifications

21

Page 29: Workflows, experimental findings, and their provenance ... interpro match report interpro scan. Janus -- IPAW, Troy, NY, June 15-17, 2010 From structure annotations to values annotations

The complete suite of PROV specifications

21

Page 30: Workflows, experimental findings, and their provenance ... interpro match report interpro scan. Janus -- IPAW, Troy, NY, June 15-17, 2010 From structure annotations to values annotations

22

2: Packaging and sharing:data + methods + provenance

- The Research Object model from Wf4Ever with: Khalid Belhajjame, University of Manchester

- The DataONE data preservation architecture

Page 31: Workflows, experimental findings, and their provenance ... interpro match report interpro scan. Janus -- IPAW, Troy, NY, June 15-17, 2010 From structure annotations to values annotations

Research Objects

23

Research Object

Datasets

Results

Scientists

Hypothesis Experiments

Provenance

Electronic paper

Workflows

RO model specification:http://wf4ever.github.com/ro/

RO primer:http://wf4ever.github.com/ro-primer/

Page 32: Workflows, experimental findings, and their provenance ... interpro match report interpro scan. Janus -- IPAW, Troy, NY, June 15-17, 2010 From structure annotations to values annotations

Example

24

Page 33: Workflows, experimental findings, and their provenance ... interpro match report interpro scan. Janus -- IPAW, Troy, NY, June 15-17, 2010 From structure annotations to values annotations

Research Objects as ORE resources• ORE = Object Exchange and Reuse

– a small vocabulary and patterns for modelling generic “aggregation”

25

Page 34: Workflows, experimental findings, and their provenance ... interpro match report interpro scan. Janus -- IPAW, Troy, NY, June 15-17, 2010 From structure annotations to values annotations

Annotations

26

Page 35: Workflows, experimental findings, and their provenance ... interpro match report interpro scan. Janus -- IPAW, Troy, NY, June 15-17, 2010 From structure annotations to values annotations

Wf4Ever project partners

27

Acknowledgement

21

Page 36: Workflows, experimental findings, and their provenance ... interpro match report interpro scan. Janus -- IPAW, Troy, NY, June 15-17, 2010 From structure annotations to values annotations

DataONE: Preservation of Observational Data

283"

Collect"

Assure"

Describe!

Deposit"

Preserve"

Discover"

Integrate"

Analyze"

Data"Life"Cycle"Tool"Support"

Kepler

DMP-Tool!

Page 37: Workflows, experimental findings, and their provenance ... interpro match report interpro scan. Janus -- IPAW, Troy, NY, June 15-17, 2010 From structure annotations to values annotations

29

6"

Components"Three"components"for"a"flexible,"scalable,"sustainable"network"

Member&Nodes&•  diverse"ins=tu=ons"•  serve"local"community"•  provide"resources"for"managing"their"data"

•  retain"copies"of"data"

Coordina/ng&Nodes&•  retain"complete"metadata"catalog""

•  indexing"for"search"•  network@wide"services"•  ensure"content"availability"(preserva=on)"""

•  replica=on"services"

Inves/gator&Toolkit&

Page 38: Workflows, experimental findings, and their provenance ... interpro match report interpro scan. Janus -- IPAW, Troy, NY, June 15-17, 2010 From structure annotations to values annotations

308"

Data"Model"Package"

Data"Science"Metadata"

Resource"Map"

System"Metadata"

System"Metadata"

System"Metadata"

Any"data"object" XML"documents:"ISO19115,"EML,"FGDC,"…"

OAIJORE"RDF"

Granule:""•  Manageable"by"DataONE"•  Has"unique"idenNfier"•  Content"does"not"change"

Page 40: Workflows, experimental findings, and their provenance ... interpro match report interpro scan. Janus -- IPAW, Troy, NY, June 15-17, 2010 From structure annotations to values annotations

Summary: Putting it all together• Research objects fit well with DataONE packages• Workflow and provenance fit well with Research Objects• PROV for provenance• PROV-W for workflow and provenance• Semantic annotations fit well with PROV-W

32

Page 41: Workflows, experimental findings, and their provenance ... interpro match report interpro scan. Janus -- IPAW, Troy, NY, June 15-17, 2010 From structure annotations to values annotations

References

33

[MLB+10] Missier, Paolo, Bertram Ludascher, Shawn Bowers, Manish Kumar Anand, Ilkay Altintas, Saumen Dey, Anandarup Sarkar, Biva Shrestha, and Carole Goble. “Linking Multiple Workflow Provenance Traces for Interoperable Collaborative Science.” In Proc.s 5th Workshop on Workflows in Support of Large-Scale Science (WORKS), 2010.[MLB+12] Missier, Paolo, Bertram Ludascher, Shawn Bowers, Ilkay Altintas, Saumen Dey, and Michael Agun. “Golden Trail: Retrieving the Data History That Matters from a Comprehensive Provenance Repository.” International Journal of Digital Curation 7, no. 1 (2012). http://www.dcc.ac.uk/events/idcc11.[MSZ+10] Missier, Paolo, Satya S Sahoo, Jun Zhao, Amit Sheth, and Carole Goble. “Janus: From Workflows to Semantic Provenance and Linked Open Data.” In Procs. IPAW 2010. Troy, NY, 2010. http://www.springerlink.com/content/am3551t4q4614r47/.[ZSM+11] Zhao, Jun, Satya S Sahoo, Paolo Missier, Amit Sheth, and Carole Goble. “Extending Semantic Provenance into the Web of Data.” IEEE Internet Computing 15, no. 1 (2011): 40–48. http://doi.ieeecomputersociety.org/10.1109/MIC.2011.7.


Recommended