ProvAbs: model, policy, and tooling for abstracting PROV graphs

IPA

W 2

014

– P.

Mis

sier

ProvAbs: model, policy, and tooling for abstracting PROV graphs

Paolo Missier, Jeremy Bryans, Carl Gamble

School of Computing Science, Newcastle University

Vasa Curcin, Roxana Danger

Imperial College, London

IPAW’14

Koln, June 10th, 2014

IPA

W 2

014

– P.

Mis

sier

Motivation: partial disclosure of provenance

Consumer: • Motivated to acquire and act upon analysis But: expect support evidence, mitigate risk of acting upon inaccurate information

Provider:• Motivated to provide accurate analysis to Public Agencies • Enhance communication using provenance metadata for evidenceBut: cannot fully disclose sources, analysis methods, etc.

IPA

W 2

014

– P.

Mis

sier

Provenance-enabled data exchanges

IPA

W 2

014

– P.

Mis

sier

Provenance exchange as part of data exchange

IPA

W 2

014

– P.

Mis

sier

Provenance abstraction

What:• Abstraction model for PROV• Policy model and language to drive the abstraction• Implementation: the ProvAbs tool

Why: • To enable data exchanges with partial disclosure of the data

provenance• To simplify understanding of provenance traces by humans

How:• Graph rewriting, from valid PROV to valid PROV

• A node grouping operator

IPA

W 2

014

– P.

Mis

sier

Provenance views

Motivation similar to the UserViews model (*)

Goals: 1. construct relevant user views2. answer to a provenance query depends on the workflow view

In contrast, in our work:

No assumption on any process specification (formal or not) driving the views on provenance

(*) Biton, O, S Cohen Boulakia, S B Davidson, and C S Hara. “Querying and Managing Provenance through User Views in Scientific Workflows.” In ICDE, 1072–1081, 2008. doi:http://dx.doi.org/10.1109/ICDE.2008.4497516.

• Heavily focused on workflow and their provenance• Scenario: one (or more) workflows, multiple users/viewers• Rely on “composite modules” (sub-workflow structuring):• Real workflow induced workflow

IPA

W 2

014

– P.

Mis

sier

History of an analyst’s report

Document produced by the “incident room analysts”

IPA

W 2

014

– P.

Mis

sier

1 – Define policy to assign sensitivity to graph nodes

list classifications[protect, restricted, confidential, secret, topSecret];

for all (activity used data) where (data.Status > confidential in classifications)

setSensitivity(activity, 7);for all (activity used data) where (data.Status <= confidential in classifications) setSensitivity(activity, 5);

IPA

W 2

014

– P.

Mis

sier

2- Node selection

Select nodes for abstraction based on the receiver’s clearance level

7 7 7

5

Receiver’s clearance level: 6

✔

︎ ✗︎ ✗ ︎ ✗ ︎ ✗

IPA

W 2

014

– P.

Mis

sier

3- Abstraction

Apply abstraction operator

7 7 7

5✔

︎ ✗︎ ✗ ︎ ✗ ︎ ✗

IPA

W 2

014

– P.

Mis

sier

Abstracting over sets of nodes

General abstraction idea: replace a group of (possibly non-contiguous) nodes with a new node

IPA

W 2

014

– P.

Mis

sier

Naïve node group replacement: introducing cycles

Generation-usage cycles are legal in PROV

Note: initial focus on vanilla PROV: usage-generation/entity-activity

IPA

W 2

014

– P.

Mis

sier

What’s wrong with cycles?

New cycles introduce new constraintson the temporal ordering of events

u’, g’ simultaneous

IPA

W 2

014

– P.

Mis

sier

More generally: mapping concrete to abstract events

Abstract graph nodes should be characterised by abstract events

• Generation is the completion of production of a new entity (PROV-DM Sec. 5.1.3)• Usage is the beginning of utilizing an entity (PROV-DM Sec. 5.1.4).

g’ = max { g1, g2 } u’ = min { u3, u4 }

IPA

W 2

014

– P.

Mis

sier

Usage-follows-generation

Abstract graphs with abstract usage-generation events correspond to a specific class of base graphs with pattern:

<all generations> -- <all usages>

All generation events for all ei must precede all usage events for all ei.

Given a grouping set of entities{e1…en}

such that:

ei wasGeneratedBy aor

a used ei:

IPA

W 2

014

– P.

Mis

sier

Naïve node group replacement -2: Type violations

IPA

W 2

014

– P.

Mis

sier

Criteria for abstraction

1. No new generation-usage cycles

2. No new dependencies

3. Satisfy type constraints on relationship

but: ok to remove some dependencies

Convexity by closure

Extension

Replacement, rewiring

IPA

W 2

014

– P.

Mis

sier

Convexity by path closure

IPA

W 2

014

– P.

Mis

sier

Replacement , rewiring

IPA

W 2

014

– P.

Mis

sier

Extension – restore type correctness

IPA

W 2

014

– P.

Mis

sier

t-grouping

Nodes in the grouping set can be a mix of Entities or Activities

• When all boundary nodes are of the same type: grouping creates a node of that type

• e-grouping: new Entity node• a-grouping: new Activity node

• Boundary nodes of mixed types: grouping can introduce a node of either type

t-grouping: creates new node of type t { En, Act }∈

Note:Grouping is commutative and closed wrt composition

IPA

W 2

014

– P.

Mis

sier

t-grouping

a-grouping e-grouping

IPA

W 2

014

– P.

Mis

sier

The ProvAbs tool

• A tool to let a policy designer explore partial disclosure options• by experimenting with policy settings and clearance thresholds.

• Accepts graphs in PROV-N format• Policy specified interactively, or loaded from file

Demo available!

IPA

W 2

014

– P.

Mis

sier

Summary

A model for abstracting PROV graph by (recursively) replacing sets

of nodes with new nodes

• Map valid PROV to valid PROV – ref.: PROV-CONSTRAINTS

• No false dependencies introduced

Abstract nodes abstract events

Extended to Agents (see TechReport)

Need to extend to more PROV relationship types

See also:Missier, P., Gamble, C., Bryans, J.: Provenance graph abstraction by node grouping. Technical report, Newcastle University (2013)http://www.ncl.ac.uk/computing/research/publication/194432

http://www.ncl.ac.uk/computing/research/publication/194432

Date post:	11-May-2015
Category:	Technology
Upload:	paolo-missier
View:	284 times
Download:	3 times

ProvAbs: model, policy, and tooling for abstracting PROV graphs

Technology