+ All Categories
Home > Documents > Anatomy ontology evaluation @ ArrayExpress

Anatomy ontology evaluation @ ArrayExpress

Date post: 20-Mar-2016
Category:
Upload: kerryn
View: 45 times
Download: 0 times
Share this document with a friend
Description:
Anatomy ontology evaluation @ ArrayExpress. Helen Parkinson, PhD. Content. ArrayExpress use cases Fuzzy matching of ontology terms Data driven ontology building Wish list. Public/Private. ATLAS. Re-annotate. Summarize. Gene queries. Experiment queries. Submit. Hybs. - PowerPoint PPT Presentation
15
www.ebi.ac.uk/ arrayexpress EBI is an Outstation of the European Molecular Biology Laboratory. Anatomy ontology evaluation @ Arr ayExpress Helen Parkinson, PhD
Transcript
Page 1: Anatomy ontology evaluation @ ArrayExpress

www.ebi.ac.uk/arrayexpressEBI is an Outstation of the European Molecular Biology Laboratory.

Anatomy ontology evaluation @ ArrayExpress

Helen Parkinson, PhD

Page 2: Anatomy ontology evaluation @ ArrayExpress

www.ebi.ac.uk/arrayexpress

Content

• ArrayExpress use cases• Fuzzy matching of ontology terms• Data driven ontology building• Wish list

Page 3: Anatomy ontology evaluation @ ArrayExpress

www.ebi.ac.uk/arrayexpress

ArrayExpress: Overview

Submit Hybs

Experiment queries

Public/Private

ATLASSummarize

Public Only

Re-annotate

Gene queries

Genes

Cross expt/speciesqueries

Page 4: Anatomy ontology evaluation @ ArrayExpress

www.ebi.ac.uk/arrayexpress

Fuzzy matching of ontology terms – why?

• Clean up ArrayExpress OE and synonym tables• OE based integration• Constrain OEs on data entry/validation• Improved searches in repository/DW web interface• Data integration across species, experiments and

experimental designs• Automated mapping of free text to ontology terms for data

imporrt

Page 5: Anatomy ontology evaluation @ ArrayExpress

www.ebi.ac.uk/arrayexpress

Phonetic Matching

• Precompute phonetic encodings of all terms in the ontology

• Match each target term by comparing these encodings• Soundex: Robert Russell and Margaret Odell (1918), famously

described by Donald Knuth• Double Metaphone: Lawrence Philips (2000)• Metaphone: Lawrence Philips

• Most matches are single• Highest success rate

Page 6: Anatomy ontology evaluation @ ArrayExpress

www.ebi.ac.uk/arrayexpress

Algorithm comparisons

Sou

ndex

Met

apho

ne

Dou

ble

Met

apho

ne

Leve

nsht

ein0%

10%20%30%40%50%60%70%80%90%

100%

SAEL vs. AE Organ-ismPart

nonemultiple_badmultiple_okaysingle_badsingle_okayvalid

Page 7: Anatomy ontology evaluation @ ArrayExpress

www.ebi.ac.uk/arrayexpress

Percent matches using automated mapping

Page 8: Anatomy ontology evaluation @ ArrayExpress

www.ebi.ac.uk/arrayexpress

Failures to match

• Species (or Kingdom)-specific terms (e.g. plant anatomy)• Conflated terms (e.g. diseased cell types)• Compound terms (e.g. "cerebral cortex and

hypothalamus")• Genuinely missing terms

• Esoteric terms less of a priority

• Most trivial misspellings, however, were matched• Dirty input data

Page 9: Anatomy ontology evaluation @ ArrayExpress

www.ebi.ac.uk/arrayexpress

Implications

• Need more terms in some commonly-used ontologies• Synonyms are important

• generating less noise • better coverage

• Choice of ontology can limit expressivity - this will be frustrating to biologists

Page 10: Anatomy ontology evaluation @ ArrayExpress

www.ebi.ac.uk/arrayexpress

Why?

• Clean up ArrayExpress OE and synonym tables• Add accessions/DB links to these tables• Constrain OEs on data entry/validation• Improved searches in repository/DW web interface• Generate suggestions for new OE terms• Evaluate domain coverage by a given ontology

Page 11: Anatomy ontology evaluation @ ArrayExpress

www.ebi.ac.uk/arrayexpressArrayExpress Ontology Development and Future Directions

24.04.2311

Developing the Ontology

• Define Scope: ArrayExpress already has some useful structure given the current database plus rich source of use cases and competency questions.

• Build: Ontology Capture: Identify key concepts and relationships within our domain and give explicit definitions to these features:• Middle-out approach – specify core of basic terms then specialise and

generalise as required

• Mappings – text mining approach to do initial semi-automated mappings to external resources for rapid coverage

• Manual mapping for data warehouse data, and selected data sets

Page 12: Anatomy ontology evaluation @ ArrayExpress

www.ebi.ac.uk/arrayexpressArrayExpress Ontology Development and Future Directions

24.04.23

Capture to Code: Definitions and Hierarchy

Page 13: Anatomy ontology evaluation @ ArrayExpress

www.ebi.ac.uk/arrayexpressArrayExpress Ontology Development and Future Directions

24.04.23

Semantic Roadmap• Position of the ArrayExpress Experimental Factor

Ontology in the ‘bigger picture’

AE Ontology

Disease Ontology Common Anatomy Reference Ontology

Cell Type OntologyChemical Entities of Biological Interest

(ChEBI) NCI

Various Species Anatomy

Ontologies

• Key is orthogonal coverage, reuse of existing resources and shared frameworks

Page 14: Anatomy ontology evaluation @ ArrayExpress

www.ebi.ac.uk/arrayexpress

Wish list

• NOT to build our own anatomy ontology• CARO extension• CARO evaluation • Mapping CARO to relevant multi-species ontologies• Application of CARO to ArrayExpress data• Use of CARO in ArrayExpress tools

Page 15: Anatomy ontology evaluation @ ArrayExpress

www.ebi.ac.uk/arrayexpress

Acknowledgments

• Anna Farne• Ele Holloway• James Malone• Margus Lukk ArrayExpress Production Team• Helen Parkinson• Tim Rayner• Faisal Rezwan• Eleanor Williams• Mengyao Zhao• Holly Zheng


Recommended