Realities in Science Data and Information - Let's go for translucency
AGU FM10 IN13B-02
Peter Fox (RPI) [email protected] World Constellation
And the reality?
It’s about the questions that are being asked, e.g.
•When was the last sensor calibration and who did it, why was it done and where are the results?
•Exactly what physics routines went into this model run and how do I know this is the actual output it generated (and that it has not been altered)?
The ecosystem?
• These are what enable scientists or anyone to explore/ confirm/ deny their ‘hunches’ or get answers to direct questions…
Accountability
ProofExplanation Justification Verifiability
‘Transparency’ (the illusion of it)
Trust
Provenance - Internal/ External
Identity
Why an illusion?
• It’s not that the word transparency is wrong, it is what it is being used for – – “If I let you see everything, you can get
answers to your questions”
• Nope, not unless you are very lucky…
• It depends on– Who is asking the question (and why)– What the answer will be used for– CONTEXT and ROLE (poorly represented)
20080602 Fox VSTO et al. 5
But back to reality
Fragmentation
Disconnection
Encapsulation
Data as service
… all are bad for the questions that are being asked
So … translucency
• See just what is necessary and suff.
• Practical definition– As close to the relevant data, information and
knowledge artifacts presented in an appropriate form
– Damn, yes, I mean curation
• Methodological means– Lenses (with filters, roles if possible)– Bags– Logic, i.e. rules
Some of this is, er…
• Provenance - Origin or source from which something comes, intention for use, who/what generated for, manner of manufacture, history of subsequent owners, sense of place and time of manufacture, production or discovery, documented in detail sufficient to allow reproducibility
• Knowledge provenance; enrich with semantics (especially the relations between concepts previously isolated, and retaining context) and semantically-aware tools
Complexity (see IN43C-05)
8
And some …
• Identity– YOUR identity– Friends, organizations– Communities– Peer and non-peer relations
• Accountability– By whom, to whom– When and how often
• Documentation – are you happy Ted?
We need a Knowledge Base
10
Access Control Essential For Establishing Trust
• Licensing• Intellectual
property• Security/ defence• Endangered
species• Sensitive Data /
Information• Defining
authorized access
Proof Markup Language
PML•Justification
– Explanation– Causality graph
•Provenance– Conclusion– Source– Engine– Rule
•Trust– Trust/Belief metrics NodeSetNodeSet
JustificationJustification
ConclusionConclusion
NodeSetNodeSet
JustificationJustification
ConclusionConclusion
NodeSetNodeSet
JustificationJustification
ConclusionConclusion
EngineEngine RuleRule RuleRule
hasAntecedentList
hasSourceUsagehasInferenceRule
hasInferenceEngine
SourceUsageSourceUsage
SourceSource
DateTimeDateTime
12
Open Provenance Model
• Agents– Catalyst and controlling
entity of a process• Processes
– Action or Series of actions performed resulting in new artifacts
• Artifacts– Immutable piece of state
• Roles– Non-semantic flat tags
used to provide context in relations ArtifactArtifact
ProcessProcess
wasGeneratedBy(Role)
AgentAgentArtifactArtifactArtifactArtifact
used(Role)
wasControlledBy(Role)
ArtifactArtifact
wasDerivedFrom(Role)
ProcessProcess
ProcessProcess
wasGeneratedBy(Role)
wasTriggeredBy(Role)
13
E.g. Knowledge Base – see Zednik et al. IN43C-06
My suggestion(s)
• Accommodation of dynamic content in an open (web) environment (distrust)
• Filter/ lens models and implementations in tools/ applications
• Declarative semantics to formalize the meaning/ terms and relations - progress
• Rules to define the combinations of evidence required - starting
• “In their face” end-user modeling – getting real use cases for presentation of ‘facts’