Transparency, applications, and ab-stuff – effect on tools for e-science:
it’s all about Informatics
June 21, 2010, IATUL 2010
Peter Fox (RPI and WHOI) [email protected] World Constellation
2
Working premise
Scientists – actually ANYONE - should be able to access a global, distributed knowledge base of scientific data that:• appears to be integrated• appears to be locally available
But… data and information is obtained by multiple means (instruments, models, analysis) using various (often opaque) protocols, in differing vocabularies, using (sometimes unstated) assumptions, with inconsistent (or non-existent) meta-data. It may be inconsistent, incomplete, evolving, and distributed AND created in a form that facilitates generation, not use (except by accident)
And… there exist(ed) significant levels of semantic heterogeneity, large-scale data, complex data types, legacy systems, inflexible and unsustainable implementation technology…
3
Data has Lots of Audiences
From “Why EPO?”, a NASA internalreport on science education, 2005
More Strategic
Less Strategic
SCIENTISTS!
Tools
Means of conduct
4
So what about abduction?
• No, not the criminal meaning…• Is a method of logical inference introduced by Peirce
which comes prior to induction and deduction for which the colloquial name is to have a "hunch".
• Abductive reasoning starts when an inquirer considers of a set of seemingly unrelated facts, armed with an intuition that they are somehow connected.
• The term abduction is commonly presumed to mean the same thing as hypothesis; however, an abduction is actually the process of inference that produces a hypothesis as its end result
5
Abductive Information System?
• What would this look like in application tools?
• If you consent that induction is fundamentally part of how an information system is developed, then how to allow for abduction before induction may be possible?
• Design factors? Architecture factors? Library factors? Cognitive factors?
6
Modern informatics enables a new scale-free** framework approach
• Use cases– requirements
• Stakeholders• Distributed
authority• Access control• Ontologies• Maintaining
Identity
Marine habitat - change
Scallop,number,density
Scallop, size,shape, color,place
Scallop,shellfragment
Rock
What is this?
Flora or fauna?
Dirt/ mud; one person’s noise is another person’s signal
Several disciplines; biology, geology, chemistry, oceanography
Several applications; science, fishing, habitat change, climate and environmental change, data integration
Complex inter-relations, questions
Use case: What is the temperature and salinity of the water and are these marine specimens usual or part of an ecosystem change?
Src: WHOI and the HabCam group
Multi-tiered interoperability
used by
20080602 Fox VSTO et al. 10
But back to reality
Fragmentation
Disconnection
Encapsulation
… all are bad for … transparency
What is the ecosystem?
• Just a few elements and they are scattered
Accountability
ProofExplanation Justification Verifiability
Transparency
Trust
Access Control Essential For Establishing Trust
• Licensing• Intellectual property• Security/ defence• Endangered species• Sensitive Data
• Full life cycle data, information and knowledge management and stewardship
Provenance
• Origin or source from which something comes, intention for use, who/what generated for, manner of manufacture, history of subsequent owners, sense of place and time of manufacture, production or discovery, documented in detail sufficient to allow reproducibility
• Knowledge provenance; enrich with semantics (especially the relations between concepts previously isolated, and retaining context) and semantically-aware tools
04/21/23
MODIS Terra & Aqua vs. AIRS Cloud Top Pressure
AIRS vs. MODIS AquaAIRS vs. MODIS Terra
MODIS Aqua vs. MODIS Terra
Correlation maps for Jan 1 – 16, 2008
Impact: Findings using aerosol data apply to other geophysical parameters!
About your selected parameters:
Parameter A Parameter B Difference alert
Parameter Name : Aerosol Optical Depth at 550 nm
Aerosol Optical Depth at 550 nm
Dataset: MYD08_D3.005 MOD08_D3.005 Diff
Data-Day definition UTC (00:00-24:00Z) UTC(00:00-24:00Z) The same but….
Temporal resolution Daily Daily
Spatial resolution 1x1 degree 1x1 degree
Sensor: MODIS MODIS
Platform: Aqua Terra Diff
EQCT 13:30 10:30 Diff
Day Time Node Ascending Descending Diff
Pre-Giovanni Processes : ATBD-MOD-30 ATBD-MOD-30
Giovanni Processes: Spatial subsetTime average
Spatial subsetTime average
Your Selected Options:
Spatial Area: Longitude ( -30, 150), Latitude (-10,60)Parameters: A: MYD08_D3.005 Aerosol Optical Depth at 550 nm
B: MOD08_D3.005 Aerosol Optical Depth at 550 nmTemporal Range: Begin Date: Jan 01 2008
End Date: Jan 31 2008Visualization Function: Lat –Lon map Time-averaged
Continue process to display image Return to selection page
Known Issues: The difference of EQCT and Day Time Node, modulated by data-day definition, caused the included overpass time difference, which makes the artifact difference. See sample images:
MODIS Terra vs. MODIS Aqua AOD Correlation Included Overpass time Difference
Semantic Advisor
Parameter A Parameter B Difference alert
Parameter Name : Aerosol Optical Depth at 550 nm Aerosol Optical Depth at 550 nm
Dataset: MYD08_D3.005 MOD08_D3.005 Diff
Data-Day definition UTC (00:00-24:00Z) UTC(00:00-24:00Z) The same but….
Temporal resolution Daily Daily
Spatial resolution 1x1 degree 1x1 degree
Sensor: MODIS MODIS
Platform: Aqua Terra Diff
EQCT 13:30 10:30 Diff
Day Time Node Ascending Descending Diff
Pre-Giovanni Processes : ATBD-MOD-30 ATBD-MOD-30
Giovanni Processes: Spatial subsetTime average
Spatial subsetTime average
Tetherless World Constellationtw.rpi.edu
Themes
Future Web•Web
Science•Policy•Social
Xinformatics•Data Science
•Semantic eScience
•Data Frameworks
Semantic Foundations•Knowledge Provenance
•Ontology Engineering Environments•Inference, Trust
Hendler
Fox
McGuinness
Multiple depts/schools/programs ~ 30 (Post-doc, Staff, Grad, Ugrad)
Partitioning
Data ScienceXinformatics
Semantic eScience
Back shed
19
Use cases
1. Do you have any data online from Hutchins from award number OCE-0423418?
2. I want to download (temperature, biological, ...) data in the following areas (N. Atlantic, bounding box, where the JGOFs survey was done, ...)
3. What new data has been added to this repository since last year (and organize it by project)
4. Show me all the places where the surface temperature in the North Atlantic is 25 degrees during June.
Tetherless World Constellation 20
Quick prototype of use case 1
Tetherless World Constellation 21
Current version
Tetherless World Constellation 22
Current version
Tetherless World Constellation 23