Date post: | 20-Jan-2016 |
Category: |
Documents |
Upload: | annabel-crawford |
View: | 212 times |
Download: | 0 times |
In Search of What Some of It Means
RDA Semantics and Metadata Workshop
Feb 23, 2015
Peter Fox (RPI) [email protected] World Constellation
Metadata and documentation
Not more code!
Spectral synthesis components and flow
Getting the metadata?
6
What I wanted ~ 1994-6
Scientists should be able to access a global, distributed knowledge base of scientific data that:
• appears to be integrated
• appears to be locally available
But… data is obtained by multiple means (instruments, models, analysis) using various protocols, in differing vocabularies, using (sometimes unstated) assumptions, with inconsistent (or non-existent) metadata. It may be inconsistent, incomplete, evolving, and distributed. And, it is almost always created in a manner to facilitate its generation not its use.
And… there exist(ed) significant levels of semantic heterogeneity, large-scale data, complex data types, legacy systems, inflexible and unsustainable implementation technology…
What I was doing…
pro read_spec, spectra_name, description, auxiliary_info, model_size, mu_size, wave_size, model, smodel, mu, wave0, wavelength, intensity, brightness_temperature, index1, index2, percent
ncopts = 0;
description_start=0
description_edges=80
i=0
j=0
k=0
; Construct the DB filename
ncid=ncdf_open(string(getenv("SPECTRA")))
inq_struct=ncdf_inquire(ncid)
; /* get dimension info */
tmp_id = ncdf_dimid(ncid, "comment_dim")
ncdf_diminq,ncid, tmp_id, dummy, comment_dim
tmp_id=ncdf_dimid(ncid, "mu_dim")
ncdf_diminq,ncid, tmp_id, dummy, mu_dim
tmp_id=ncdf_dimid(ncid, "wave_dim")
ncdf_diminq,ncid, tmp_id, dummy, wave_dim
tmp_id=ncdf_dimid(ncid, "model_dim")
ncdf_diminq,ncid, tmp_id, dummy, model_dim
tmp_id=ncdf_dimid(ncid, "smodel_dim")
ncdf_diminq,ncid, tmp_id, dummy, smodel_dim
tmp_id=ncdf_dimid(ncid, "item_dim")
ncdf_diminq,ncid, tmp_id, dummy, item_dim
What I was doing… etc.
tmp_id = ncdf_varid (ncid, "description")
ncdf_varget,ncid, tmp_id, OFFSET=0,COUNT=comment_dim, description
; Id's for variables
tmp_id=ncdf_varid(ncid, "spectra_name")
ncdf_varget,ncid, tmp_id, OFFSET=0,COUNT=comment_dim, spectra_name
tmp_id=ncdf_varid(ncid, "auxiliary_info")
ncdf_varget,ncid, tmp_id, OFFSET=0,COUNT=comment_dim, auxiliary_info
tmp_id=ncdf_varid(ncid, "model_size")
ncdf_varget,ncid, tmp_id, OFFSET=0,COUNT=item_dim, model_size
start=intarr(1)
edges=intarr(1)
start(0)=0
edges(0)=model_size
tmp_id=ncdf_varid(ncid, "mu_size")
ncdf_varget,ncid, tmp_id, mu_size, OFFSET=start, COUNT=edges
tmp_id=ncdf_varid(ncid, "model")
ncdf_varget,ncid, tmp_id, model, OFFSET=start, COUNT=edges
start=intarr(2)
edges=intarr(2)
start(0)=0
edges(0)=smodel_dim
start(1)=0
edges(1)=model_size
tmp_id=ncdf_varid(ncid, "smodel")
ncdf_varget,ncid, tmp_id, smodel, OFFSET=start, COUNT=edges
What does It all Mean?
Some version of this…
10
Data Information Knowledge
Context
PresentationOrganization
IntegrationConversation
CreationGathering
Experience
~Metadata?
It and Meaning
• It = things that matter– Context
• Meaning = duh -> semantics• Relations!! Real ones!
• But it was more than that, though that often comes later…– Syntax (structure/form)– Semantics (meaning)– Pragmatics (use)
Metadata-Information-Knowledge Ecosystem
12
Metadata Information Knowledge
Context
FormalizationOrganization
IntegrationShared Conceptualization
CreationGathering
Experience
Provenance
• Origin or source from which something comes, intention for use, who/what generated for, manner of manufacture, history of subsequent owners, sense of place and time of manufacture, production or discovery, documented in detail sufficient to allow reproducibility
• Provenance: metadata in a given context! Swallow that.
• Knowledge provenance; meaning and relations in multiple contexts!
Perfect is the enemy of the good… (thanks Voltaire)
Origins …
• In 2000-2001 the need for capturing and preserving knowledge in science data became very clear but the barriers were high
• In 2004 we started a virtual observatory project based on semantic technologies
• Use case driven – in solar and solar-terrestrial physics with an emphasis on instrument-based measurements and real data pipelines; we needed implementations
• We knew we also needed integration and provenance (but that came later)
• We aimed to push semantics into our systems to build new ‘prototypes’ but we ‘failed’ ;-)
Tetherless World Constellation 15
In 2004
• 2004 – OWL was a W3 recommendation!!• Protégé 2.x and the Protégé-Java-OWL
API• SWOOP was a viable editor• Jena and the Jena API were in good
shape• Pellet worked• SPARQL was still a twinkle in the RDF
working group’s eye• Semantics were still the realm of computer
scientists
Tetherless World Constellation 16
Design and Development
• We made a conscious decision only to develop ontologies that were required to answer specific use cases and migrate metadata– Both Classes AND Properties (uh-oh…)
• We made a conscious effort to use whatever ontologies were available (cf. trends in metadata… nuff said)
• We were pretty sure that rules would be needed (complex logic or late semantic binding)
• We ignored query (see implementation)
Tetherless World Constellation 17
18
Use Case example
• Plot the neutral temperature from the Millstone-Hill Fabry Perot, operating in the non-vertical mode during January 2000 as a time series.
• Plot the neutral temperature from the Millstone-Hill Fabry Perot, operating in the non-vertical mode during January 2000 as a time series.– Meanings and relations
• Objects=Things!– Neutral temperature is a (temperature is a) parameter– Millstone Hill is a (ground-based observatory is a) observatory– Fabry-Perot is a interferometer is a optical instrument is a instrument– Non-vertical mode is a instrument operating mode– January 2000 is a date-time range– Time is a independent variable/ coordinate– Time series is a data plot is a data product
• Metadata just appeared everywhere…
Semantics - Modern informatics enables a new scale-free** framework approach
• Use cases
• Stakeholders
• Distributed authority
• Access control
• Ontologies
• Maintaining Identity
Semantics between 2004 and 2009
• Ontologies were needed for data integration and provenance and mediation for data mining
• Protégé 3.x and then 4.0 came out• SWOOP development was interrupted• Cmap added OWL predicate support*• SPARQL became a recommendation• Triple stores exploded in use and capability• Linked Open Data started to take off• Pellet 2.0 came out• I used the “M” word less frequently!
Tetherless World Constellation 22
Working with knowledge
Expressivity
Maintainability/ Extensibility
Implementability
Working with semantics
Query
Rule execution
Inference
Semantics between 2009 and now
• Semantic data framework (SeSF)• Substantial knowledge provenance work• Data quality, uncertainty and bias
representations and applications (oh, these are in production at NASA)
• Multi-sensor data synergy advisor• Applications:
– Sea Ice, Carbon Observatory, Integrated Ecosystem Assessments, globalchange.gov, ocean.data.gov, energy.data.gov ….
Tetherless World Constellation 25
Respect and Mediation … how
Discovering new data
Information model
29
Ontology
Core and Framework Semantics - Multi-tiered interoperability
used by
Closing thoughts
• Go ahead, create all the metadata you want, we’ll “materialize” some of it into triples based on semantics for use!
• Go ahead, create all the schema and encodings you want but remember – semantics now lives in an open-world (some of it). You are not the only source of metadata. Not all formal. Link over map.
• Semantics make metadata useful but we do not need all of your metadata
Tetherless World Constellation 31
Contact
• http://tw.rpi.edu
• @taswegian