+ All Categories
Home > Documents > Terminology Metadata Salvatore Mungal Duke University Extension of the Service Meta Model Faro,...

Terminology Metadata Salvatore Mungal Duke University Extension of the Service Meta Model Faro,...

Date post: 28-Dec-2015
Category:
Upload: benjamin-parker
View: 214 times
Download: 0 times
Share this document with a friend
30
Terminology Metadata Salvatore Mungal Duke University Extension of the Service Meta Model Faro, Portugal, 16 th November 2008
Transcript

Terminology Metadata

Salvatore Mungal

Duke University

Extension of the Service Meta Model

Faro, Portugal, 16th November 2008

2

AgendaAgenda● Background

● Review Proposed Model

● Next Steps

● Question/Discussion

3

Team MembersTeam Members● Brian Davis (3rd Millennium) ● Frank Hartel (NCI)● Tom Johnson (Mayo)● George Komatsoulis (NCI)● Hua Min (Fox Chase) ● Sal Mungal (Duke)● Scott Oster (OSU)● Mike Riben (MD Anderson)● Denise Warzel (NCI)

4

Background - caBIGBackground - caBIGTMTM

• Background - caBIGTM: – The cancer Biomedical Informatics Grid (caBIGTM)

is an information technology program that develops software tools to support cancer research efforts by establishing a common infrastructure that can be used to share data and applications across organizations.

– Interoperability is a key factor in caBIG’s infrastructure and is very dependent on terminologies.

5

Background - caBIGBackground - caBIGTMTM

• Background – caBIGTM/Metadata: – The growing demand for service-oriented

access to terminologies in the caBIG community is anticipated to result in increased publication of respective services on caGrid.

– Efficient discovery, administration and query of these resources require definition and use of consistent and reliable metadata from the cancer Data Standards Repository (caDSR) that can be queried at the service level.

6

Background - GoalsBackground - Goals• Goals:

– Identify metadata queryable at the index service level

– Narrow focus for first model revision…• Initial model defined to satisfy discoverydiscovery use cases• Support development of enhanced grid discovery

client• Resolve runtime services for terminologies of interest• Additional metadata available through runtime services

– Allow/anticipate future expansion

7

Background – Use CasesBackground – Use Cases

• Use Case Collection– Identification

– Internationalization

– Intended/Allowed Usage

– Provenance

– Administration

8

Background - Use CasesBackground - Use Cases

• Samples– Browsing Ontologies

– Viewing Differences

– Administering the NCBO Ontology Library

– Web of Trust

9

(1) Browsing• An ontology developer is interested in creating an ontology

for a domain (e.g., radiographic anatomy).– Determine if there are already similar ontologies in that domain. – Evaluates assigned categories for registered ontologies. – Discovers match for “anatomy”

If there is a match:– Views available titles and descriptions– Finds listings for “human” and “mouse” anatomy, but not

“radiology” – Looks at the human anatomy ontology to see if it fits the

need

Attributes: category, title (preferred Name), description

Background - Use CasesBackground - Use Cases

10

(2) Viewing Differences

• An ontology developer wants to view what has changed between two versions of an ontology.

– Retrieve listing of registered terminology services

– Sort by URI, then version

– Select and resolve grid services for differing versions

– Invokes runtime services to resolve and compare content

Attributes: uri (id), version

Background - Use CasesBackground - Use Cases

11

(3) Discovery• A user wants to contact the providers for new

ontologies registered within the last quarter.

– Query registered ontologies by registration date– Pull point of contact information (source, curator,

registration authority) from listed items

Attributes: registration date, registration authority, source, curator

Background - Use CasesBackground - Use Cases

12

(4) Web of Trust• Quality of ontologies:

– User is aware that there are several anatomy ontologies, and is unclear which to use.

– Trusts certain ontology sources (anatomists) more than others

– Views ontology source to determine content origin– Views intended and example use to consider alignment with

application– Considers caBIG certification level

Attributes: source, intended use, example use, certification level

Background - Use CasesBackground - Use Cases

13

Background – ModelBackground – Model• Focus of work on …

– Model alignment• External … Incorporate feedback from review and

alignment with relevant specifications and standards.• Internal … Take better advantage of previously

registered models and classes.

– Incorporating specific feedback on model classes and attributes.

14

Background - AlignmentBackground - Alignment

• Specifications/standards considered …– Dublin Core– ISO 11179-2/3/6: classification, registries, admin– LexGrid/LexBIG model– National Center for Biomedical Ontology (NCBO) BioPortal– Public Health Information Network (CDC/PHIN)– Simple Knowledge Organization System (SKOS core)– UMLS Rich Release Format (RRF)– CTS/CTS2

15

Background – Model AlignmentBackground – Model Alignment

16

Background – Model AlignmentBackground – Model Alignment

17

Background – Model AlignmentBackground – Model Alignment• Findings …

– No silver bullet– General alignment for defined items

• All SWG items and definitions represented conceptually in one or more specifications

• Adequate, but not perfect, alignment of semantics• Some name changes

– Some new attributes identified• Supplement existing use case• Generally not found to be required unless we add use

cases

18

Model - OverviewModel - Overview

19

Model - AdministeredComponentModel - AdministeredComponent

● Administered Component - a common superclass based on the ISO 11179 standard providing typical attributes relevant to any component registered and maintained on the grid (e.g. id, preferredName, preferredDefinition)

cd Domain Model

domain::AdministeredComponent

20

Model – Core IdentificationModel – Core Identification& Description& Description

● localName (1..n)

– Name used to refer to the terminology within a localized context; often a mnemonic.

– ICD-9-CM, ICD-9

● category (0..n)– Applicable domains or scientific fields.– e.g. anatomy, genomic, proteomic,

phenotype…

cd Domain Model

Terminology

+ category: java.lang.String [0..n]+ defaul tLanguage: java.lang.String = en+ keyword: java.lang.String [0..n]+ localName: java.lang.String [1..n]+ structure: StructureT ype+ supportedContentT ype: java.lang.String [1..n]+ supportedLanguage: java.lang.String [1..n] = en+ type: T erm inologyT ype

21

● type (0..1)– Nature of content relative to the category.– application – describes domain in an application

dependent manner– core – describes domain in an application

independent manner– domain – describes the most important

concepts in a domain– task – describes generic types of tasks or

activities (e.g. selling, selecting)– upperLevel – describes general, domain

independent concepts (e.g. space, time)

● structure (1)– Indicates complexity of maintained relationships– flat – no hierarchy– simple - supports a single inheritance mono-

hierarchical structure.– complex - supports multiple relationships and/or

relationship types

Model – Core IdentificationModel – Core Identification& Description& Description

cd Domain Model

Terminology

+ category: java.lang.String [0..n]+ defaul tLanguage: java.lang.String = en+ keyword: java.lang.String [0..n]+ localName: java.lang.String [1..n]+ structure: StructureType+ supportedContentType: java.lang.String [1..n]+ supportedLanguage: java.lang.String [1..n] = en+ type: TerminologyType

22

● defaultLanguage (1)– Language for text unless otherwise specified– eng

● supportedLanguage (1..n)– Languages supported for text-based content– eng, spa, …

● supportedContentType (1..n)– Supported type of text or imbedded multimedia– e.g. mime type (text/plain, image)

● keyword (0..n)– Words or phrases of special significance.– patient record, nursing protocol, …

Model – Core IdentificationModel – Core Identification& Description& Description

cd Domain Model

Terminology

+ category: java.lang.String [0..n]+ defaul tLanguage: java.lang.String = en+ keyword: java.lang.String [0..n]+ localName: java.lang.String [1..n]+ structure: StructureType+ supportedContentType: java.lang.String [1..n]+ supportedLanguage: java.lang.String [1..n] = en+ type: Term inologyType

23

Model - UsageModel - Usage

● intendedUse (0..n)– Human-readable description of intended use.– data integration

● exampleUse (0..n)– Human-readable example of use.– Integration of protein data.

● isRestricted (1)– Indication of intellectual property boundaries.– true

● rights (0..n)– Human-readable description of IP rights.– NCI Thesaurus terms of use …

cd Domain Model

TerminologyUsage

+ exampleUse: java.lang.String [0..n]+ intendedUse: java.lang.String [0..n]+ isRestricted: IsRestrictedT ype+ rights: java.lang.String [0..n]

24

Model - ProvenanceModel - Provenance

● releaseDate (0..1)– Date of availability in released format.– 2007-08-30

● releaseFormat (0..1)– Format as released by the curator.– e.g. OWL, OBO, RRF source (0..1)– Origin or provider of content– National Center for Health Statistics (NCHS)

● releaseLocation (0..1)

– Location of resource in the releaseFormat.

– ftp://ftp1.nci.nih.gov/pub/cacore/EVS/NCI_Thesaurus/Thesaurus_07.12a.OWL.zip

● releaseVersion (0..1)– Represented version identifier.– 2007

cd Domain Model

TerminologyProv enance

+ releaseDate: Date [0..1]+ releaseFormat: java.lang.String [0..1]+ releaseLocation: java.lang.String [0..1]+ releaseVersion: java.lang.String [0..1]

25

Model - CertificationModel - Certification

● type– e.g. Good housekeeping or caBIG’s level of

compliance

● value– e.g. Gold seal of approval or caBIG’s bronze,

silver, gold

cd Domain Model

Certification

+ type: java.lang.String+ value: java.lang.String

26

Model - ReleasePackageModel - ReleasePackage

● name– meta distribution containing the terminology

as released. eg. UMLS

● version

– identifier of the composite ontology or meta distribution containing the terminology as released. eg. 2007AB

cd Domain Model

ReleasePackage

+ name: java.lang.String+ version: java.lang.String

27

Model - ContactModel - Contact

● A common class used to maintain contact information (e.g. name, address, phone) as defined by the ISO 11179 standard

cd Domain Model

domain::AdministeredComponentContact

28

Model – Anticipated Alignment against available classesModel – Anticipated Alignment against available classes

SuperclassesBased on 11179

29

Next StepsNext Steps

● Currently, model harmonization with recommended superclasses is complete but additional changes are anticipated in the registration process using caBIGTM tools

● Change caGRID tooling to capture additional metadata when registering terminology

● Create custom discovery client for terminology services, to take advantage of additional metadata in support of the identified use cases

30

Questions/DiscussionQuestions/Discussion

● Links:– https://cabig.nci.nih.gov/– http://gforge.nci.nih.gov/projects/termeta– http://cdebrowser.nci.nih.gov/CDEBrowser/ICB/

infrastructure/cacore_overview/cadsr

[email protected]


Recommended