Resource Curationand
Automated Resource Discovery
NIF Resources
• NIF is cataloging websites that house information about databases, atlases, software tools, data, transgenic mice and other things that we consider of value to the neuroscience community.
Definition of Resource
• Individual resource boundary: shall be considered an individual resource if it is maintained by a single entity, and has the properties of one or more individual web pages that are related by a theme and html links.
Resource Nomination
Registry(4500)
Public Registry(2100)
NIF Web(499,952)
Level 2/3(24)
User Feedback*Automated tools Web
Crawl
RegistrySubset
Nomination
Check: -Links
-Annotation-Vocabulary
*Automated updates Level 2 tools
*In Development
Resource is NominatedNIF Staff, Contact at Meetings, Web Form
In NIF already?
Assign Metadata-short name, long name, url
-description (short description 1-3 sentences, longer description)-parent organization (physical location, university)
-support (grant numbers)-keywords (species, technique, structure, age, level, disease, topic)
Decision: Should it be included?
Assign resource type
Do not includeKeep Record
Resources Difficult to Categorize• Link aggregates• Large organizations (NIH)• Poorly documented databases• Private data sites• Clinical trials that are still recruiting
– Experimental protocol
• Commercial entities• Journals
– JOVE– supplemental materials
CINdy the resource curation tool
Resource Ontology (BRO)• Data Resource: provides access to data;
database, atlas, book• Software Resource: software programs or
source code• Material Resource: reagents, tissue samples or
organisms• Funding Resource: grants or contracts• Training Resource: educational materials,
training programs• Job Resource: employment opportunities• People Resource: access to individual people’s
web sites
NIF Service vs BRO Service
Solutions Consolidating Classes• Synonyms where appropriate: ex. Material
storage service vs. Material storage repository.
• Temporary mapping, where appropriate– *Deprecated terms must be maintained*
• Data loss
• Moving forward with a joint descriptive terminology!
Evolution of the NIF Resource Ontology
Object Function Target Audience
Data Type Data Format
Materials -Biomaterials -Reagents
Software
People
Grants
Jobs
Information
Service -Storage -Production
Funding
Job Service
Community-building
General
Kids
Student
Medical
Researcher
Structured -Database -Atlas
Unstructured -Journal -Webpage
Text
RDF Text
Picture
Video
Resource Boundary?• Software Library
– Software tool• Plugin: I2B2
• Our solution: use url as a uniqueness qualifier– Our problem: a single url may house several
resources– Individual plugins can have individual urls
Boundary cont.• Individual resource boundary: shall be
considered an individual resource if it is maintained by a single entity, and has the properties of one or more individual web pages that are related by a theme and html links.
• Solution to random boundary problem:
Human Curator
Issues of Scope• Single line or short paragraph + keywords
– Resource discovery problem
*Stanford ontologies description is very short (as are many) finding this resource by keyword will be difficult unless we index the content of the website.
• Data dump– Small vs. Large databases– Updates
Internal referencing• Stanford example:
– License: “same as bioportal” – does not match any license types in any list.
– Problem: non standard terminology, reference to another project (no url), can create loops • also true in publications: ex., used same protocol
as paper X, which used the same protocol as paper Y
– Automated text mining tools have a hard time recognizing these
What can we gain from automated systems?
• Basic information: Name, url, contact info
• Some keywords• Some descriptive text
• No resource boundary• No resource description
How do we help the computers?
• Common naming project (neurocommons)
http://sharedname.org/page/Main_Page• Automated uri’s• Community building:
– Shared data models– Shared ontology– RDF entity tags? (mouse vs mouse)