The European Bioinformatics InstituteThe European Bioinformatics Institute
MIAME and Ontologies for Sample Description
Helen ParkinsonMicroarray Informatics Team
European Bioinformatics Institute
EMBO Course, October 2001
The European Bioinformatics InstituteThe European Bioinformatics Institute
Talk Structure
ArrayExpress - a public database for microarray data and integration of ontologies
Ontologies for gene expression data Submission and annotation tool
The European Bioinformatics InstituteThe European Bioinformatics Institute
Problems of microarray dataanalysis
Size of the datasets Different platforms - nylon, glass
Different technologies on platforms- oligo/spotted
Referencing external databases which are not stable
Sample annotation Array annotation Need for LIMS systems and the need for
bioinformaticians
The European Bioinformatics InstituteThe European Bioinformatics Institute
General MIAME principles
Recorded info should be sufficient to interpret and replicate the experiment
Information should be structured so that querying and automated data analysis and mining are feasible
The European Bioinformatics InstituteThe European Bioinformatics Institute
A gene expression database from the data analyst’s point of view
SamplesG
enes
Gene expression levels
Sample annotations
Gene annotations
Gene expression matrix
The European Bioinformatics InstituteThe European Bioinformatics Institute
Gene Annotation
Can be given by links to gene sequence databases and GO can be used on the analysis side (function,process,cell compartment)
MIAME is flexible, allows many kinds of sequence identifiers or even sequence itself.
In some cases it’s more useful to include a real sequence than an inaccurate id
In the end we will need a mapping from a gene list to all the spots on all arrays, this is non trivial given the problems with names
The European Bioinformatics InstituteThe European Bioinformatics Institute
Sample annotation
Gene expression data only have meaning in the context of detailed sample descriptions
If the data is going to be interpreted by independent parties, sample information has to be searchable and in the database
Controlled vocabularies and ontologies (species, cell types, compound nomenclature, treatments, etc) are needed for unambiguous sample description
The European Bioinformatics InstituteThe European Bioinformatics Institute
Standardisation of microarray data and annotations -MGED
group
The goal of the group is to facilitate the adoption of standards for DNA-array experiment annotation and data representation, as well as the introduction of standard experimental controls and data normalisation methods. Includes most of the worlds largest microarray laboratories and companies (TIGR,Affymetrix Stanford,Sanger,Agilent etc)
www.mged.org
The European Bioinformatics InstituteThe European Bioinformatics Institute
Sample annotation- what can be done? Build an ontology for gene expression data
(MGED) Use existing ontologies and link them in Incorporate the ontology into the database Develop internal editing tools for the ontology Develop browser or other interface for the
ontology and link to LIMS Some use of free text descriptions are
unavoidable (curation workload)
The European Bioinformatics InstituteThe European Bioinformatics Institute
Use case scenariosReturn a summary of all experiments that use a specified type of biosource (primary source).
Group the experiments according to treatment.
Return a summary of all experiments done examining effects of a specified treatment
Group the experiments according to biosource.
Return a summary of all experiments measuring the expression of a specified gene.
Indicate when experiments confirm results, provide new information, or conflict.
The European Bioinformatics InstituteThe European Bioinformatics Institute
MIAME – Minimum Information About a Microarray Experiment
PublicationExternal links
6 parts of a microarray experiment
www.mged.org
Hybridisation ArrayGene
(e.g., EMBL)Sample
Source(e.g., Taxonomy)
Data
Experiment
Normalisation
The European Bioinformatics InstituteThe European Bioinformatics Institute
MGED Biomaterial (sample) Ontology
Under construction by Chris Stoeckert – Using OILed (though other tools exist)
Motivated by MIAME and coordinated with the database model
We will extend classes, provide constraints, define terms, provide new terms and develop cv’s for submissions (EBI)
The European Bioinformatics InstituteThe European Bioinformatics Institute
Part of the MGED biomaterial ontology
class Agedocumentation: The time period elapsed since an identifiable point in the life cycle of an
organism. If a developmental stage is specified, the identifiable point would be the beginning of that stage. Otherwise the identifiable point must be specified such as planting.
type: primitivesuperclasses: BiosourceProperty constraints: slot-constraint has_measurement has-value Measurementslot-constraint
initial_time_point has-value one-of (planting beginning_of_stage) used in slots: initial_time_point
The European Bioinformatics InstituteThe European Bioinformatics Institute
organism (NCBI taxonomy)cell source - provider cell type (if derived from primary sources (s))sexagegrowth conditionsdevelopment stageorganism part (tissue)animal/plant strain or linegenetic variation (e.g., gene knockout, transgenic variation)individualindividual genetic characteristics (e.g., disease alleles, polymorphisms)disease state or normaltarget cell typecell line and source (if applicable)in vivo treatments (organism or individual treatments)in vitro treatments (cell culture conditions)treatment type (e.g., small molecule, heat shock, cold shock, food deprivation)compoundis additional clinical information available (link)separation technique (e.g., none, trimming, microdissection, FACS)
laboratory protocol for sample treatment……
MIAME Section on Sample Source and Treatment
The European Bioinformatics InstituteThe European Bioinformatics Institute
Examples of usable external ontologies
NCBI taxonomy database Jackson Lab mouse strains and genes Edinburgh mouse atlas anatomy HUGO nomenclature for Human genes Chemical and compound Ontologies - Merck
index TAIR Flybase GO
The European Bioinformatics InstituteThe European Bioinformatics Institute
Excerpts from a Sample Descriptioncourtesy of M. Hoffman, S. Schmidtke, Lion BioSciences
Organism: Mus musculus [ NCBI taxonomy browser ]Cell source: in-house bred mice (contact: [email protected]) Sex: female [ MGED ]Age: 3 - 4 weeks after birth [ MGED ]Growth conditions: normal
controlled environment20 - 22 oC average temperaturehoused in cages according to EU legislationspecified pathogen free conditions (SPF)14 hours light cycle10 hours dark cycle
[Developmental stage]: stage 28 (juvenile (young) mice)) [ GXD "Mouse Anatomical Dictionary" ]Organism part: thymus [ GXD "Mouse Anatomical Dictionary" ]Strain or line: C57BL/6 [International Committee on Standardized Genetic Nomenclature for Mice]Genetic Variation: Inbr (J) 150. Origin: substrains 6 and 10 were separated prior to 1937. This substrain is now probably the most widely used of all inbred strains. Substrain 6 and 10 differ at the H9, Igh2 and Lv loci. Maint. by J,N, Ola. [International Committee on Standardized Genetic Nomenclature for Mice ]Treatment: in vivo [MGED] [intraperitoneal] injection of [Dexamethasone] into mice, 10 microgram per 25 g bodyweight of the mouseCompound: drug [MGED] synthetic [glucocorticoid] [dexamethasone], dissolved in PBS
The European Bioinformatics InstituteThe European Bioinformatics Institute
Introduction to the database
ArrayExpress is implemented in Oracle The submission tool is a different
implementation of the ArrayExpress model in Mysql
Faster, easier to update Short term solution to the problem of
data submission
The European Bioinformatics InstituteThe European Bioinformatics Institute
ArrayExpress conceptual model
PublicationExternal links
Hybridisation ArraySampleSource
(e.g., Taxonomy)
Experiment
Normalisation
Gene(e.g., EMBL)
Data
The European Bioinformatics InstituteThe European Bioinformatics Institute
ArrayExpress DatabaseMAGE-OM Model
Curation Database
User Login
Array Submission
Protocol Sub.
Experiment submission
Submission tool
Query Interface for Public Data
Analysis ToolsExpression Profiler
Large ScaleSubmissionsMAGE-ML
format
Submitter LIMS
Browse Arrays
Browse Protocols
Browse Protocols
Data File ExportExternal
Applications
Browse Arrays
External Databases,
EMBL, Ontology Resources…
etc
The European Bioinformatics InstituteThe European Bioinformatics Institute
MIAMExpress Based on MIAME concepts and
questionnaire Experiment, Array, Protocol submissions CV/Ontology wherever possible Future versions organism specific pages and
related linked ontologies Allow user driven ontology development Will be developed according to user needs Will also need to be an update tool
The European Bioinformatics InstituteThe European Bioinformatics Institute
Design Considerations
Speed and ease of use, scalability Need to browse existing protocols and array
designs in ArrayExpress Requirement for curator control over
submissions Submissions tracking Future use as a LIMS Flexibility
The European Bioinformatics InstituteThe European Bioinformatics Institute
Features of MIAMExpress Creates a user login account instead of on-
the-fly submissions so sessions can be saved Allows existing protocols to be copied and
saved and linked to more than one hyb/expt Forms the basis of a LIMS using the
ArrayExpress model Will be available as a stand alone tool for
local installation Is open source and free Will be supported by curation staff and
developers
The European Bioinformatics InstituteThe European Bioinformatics Institute
The European Bioinformatics InstituteThe European Bioinformatics Institute
Expected Users
Users with limited local bioinformatics support
Users of bought in arrays without LIMS Small scale users with self made
arrays who will need to provide a description
Commercial arrays descriptions will be provided
The European Bioinformatics InstituteThe European Bioinformatics Institute
Acknowledgments
Whole Microarray Informatics Team, EBI, esp. Alvis Brazma, Mohammad Shojatalab and Ugis Sarkans
Industry Support team, EBI MGED steering committee MIAME working group Chris Stoeckert, U. Penn. and members of
MGED
The European Bioinformatics InstituteThe European Bioinformatics Institute
Demo Version of MIAMExpress
Coming soon to www.ebi.ac.uk.microarray
Beta tester recuitment