Date post: | 11-Jan-2016 |
Category: |
Documents |
Upload: | prosper-mosley |
View: | 219 times |
Download: | 0 times |
SeaDataNet Ontology Use Case SeaDataNet Ontology Use Case
Roy Lowry
British Oceanographic Data Centre
Coastal Atlas Interoperability Workshop, Corvallis, July 17-19 2007Coastal Atlas Interoperability Workshop, Corvallis, July 17-19 2007
(+ Lessons Learned)(+ Lessons Learned)
SummarySummary
What is SeaDataNet?
Some SeaDataNet semantic issues
What has SeaDataNet done?
What is SeaDataNet going to do?
Is SeaDataNet relevant to CAI?
What is SeaDataNet?What is SeaDataNet?
SeaDataNet in a Nutshell Combine over 40 oceanographic data centres across
Europe into a single interoperable data system
Approach is to adopt established standards and technologies wherever possible
Two phases:
One brings 12 centres together with centralised metadata and distributed data as files. Due fully operational in autumn 2008 (beta next February)
Two introduces data virtualisation, aggregation, cutting and 30 more centres. Due in 2010
Project is well on its way up the interoperability operational implementation curve
SeaDataNet Semantic Issues SeaDataNet Semantic Issues
The major problem facing the project is heterogeneous legacy content SeaDataNet inherited 3 independently-developed
metadatabases
Each is heavily populated (3000-30000 records)
Each had its own independently developed controlled vocabularies
These vocabularies
– Covered overlapping domains
– Said similar things in different ways
– Provided a shining example of how NOT to manage vocabularies
Brief DiversionBrief Diversion
Vocabularies can have two types of governance Content governance
Mechanism for making decisions on vocabulary population– Expected deliverables include:
» Vocabulary standards and internal consistency» Change on a timescale matching the needs of
the user community» Terms with definitions!!!
Technical governance Vocabulary storage, maintenance and serving
– Expected deliverables include:» Convenient access to up to date vocabularies» Clear, rigorous vocabulary versioning» Version history through audit trails» Maintenance that doesn’t break user systems
SeaDataNet Semantic IssuesSeaDataNet Semantic Issues
Vocabulary content governance Done by individuals who were often inadequately qualified
to do the job Metadata entry form with an ‘Add to Vocabulary’ button
used by students
Vocabulary technical governance Scattered files on servers or inaccessible database tables Multiple data models (e.g. some with abbreviations, some
without) No versioning Vocabularies updated by destructive overwrites
Harmonisation required for related vocabularies Within centralised metadata Between partner local systems and centralised metadata
What has SeaDataNet Done?What has SeaDataNet Done?
Established content governance
Within SeaDataNet (TTT e-mail list)
Further afield (SeaVoX e-mail list)
Established technical governance
Adopted the NERC DataGrid Vocabulary Server
– Heavily defended Oracle back end
– Automated version and audit trail management
– Web Service API front end plus clients e.g. http://vocab.ndg.nerc.ac.uk/client/vocabServer.jsp
– Currently serving out 75 lists
Established a Mapping Infrastructure
List entries connected by SKOS RDF triples
Operational mappings between parameter vocabularies (GCMD science keywords, CF Standard Names)
What is SeaDataNet Going To Do?What is SeaDataNet Going To Do?
Harmonise centralised metadata vocabularies or map if too hard
Map centralised vocabularies to partner system vocabularies
Build metadata crosswalks and generators (e.g. from CF) that include semantics (Use case 1)
Implement ‘Smart Discovery’ for legacy plaintext. E,g. search for pigment, find chlorophyll (Use case 2)
Establish URLs to represent vocabularies and individual entries delivering XML – probably SKOS – documents
Extend mapping efforts to other areas such as ‘devices’
Release a much improved Vocabulary Server API (mid-August)
Is SeaDataNet Relevant to CAI?Is SeaDataNet Relevant to CAI?
This workshop is about building a coastal atlas ontology that brings together semantic resources that say similar things in different ways
The vocabulary entry semantic content may be different from oceanographic parameters, but the problem is essentially the same
If it works for SeaDataNet it will probably work for the CAI community
More important – if it didn’t work for SeaDataNet then it probably won’t work for CAI
Is SeaDataNet Relevant to CAI?Is SeaDataNet Relevant to CAI?
What has worked for SeaDataNet: The NERC DataGrid Vocabulary Server
Content governance through a MODERATED e-mail list (also works pretty well for CF Standard Names)
Representing vocabulary terms by URNs in metadata documents
What I believe will work in the next 12 months: Semantic interoperability through mappings
The conceptual framework of RDF in general and SKOS in particular
21st Century tooling
Is SeaDataNet Relevant to CAI?Is SeaDataNet Relevant to CAI?
What hasn’t worked for SeaDataNet: Weak content governance
Examples– Terms without definitions– Vocabularies without strict entity definitions populated by mixed
entities e.g. » helicopter = class » RRS Discovery = instance
– Vocabularies without managed deprecation
Poor technical governance
Example– A vocabulary served by:
» Dynamic web page from database» Static HTML page» ASCII file as e-mail attachment» Each having a different number of entries….
That’s All Folks!That’s All Folks!
Thank you for your attention
Any questions?
Morals
Always provide definitions for your terms
If you are going to use vocabularies to build an ontology make sure that they are properly governed