+ All Categories
Home > Documents > The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards...

The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards...

Date post: 28-Dec-2015
Category:
Upload: lorena-johns
View: 220 times
Download: 5 times
Share this document with a friend
Popular Tags:
36
The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and Penn Center for Bioinformatics, U. Penn School of Medicine Philadelphia, PA
Transcript
Page 1: The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.

The MGED SocietyFacilitating Data Sharing and

Integration with Standards

CTSA Omics Data Standards Working Group

Chris Stoeckert

Dept. of Genetics and Penn Center for Bioinformatics, U. Penn School of Medicine

Philadelphia, PA

Page 2: The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.

Goal: Developing integrated data repositories, e.g. genomics, transcriptomics, etc. along with clinical data.

Integration requires standards: For efficient loading and access For data sharing

Your data repository Some other public repository

Page 3: The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.

Goal: Developing integrated data repositories, e.g. genomics, transcriptomics, etc. along with clinical data.

Integration requires standards: For efficient loading and access For data sharing

Your data repository Some other public repository

MAGE-TAB for microarray, UHTS dataOBI for describing biomedical (including clinical) data

Page 4: The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.

The MGED Society MissionThe MGED Society is an international organization of biologists,

computer scientists, and data analysts that aims to facilitate biological and biomedical discovery through data integration.

Page 5: The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.

The MGED Society MissionThe MGED Society is an international organization of biologists,

computer scientists, and data analysts that aims to facilitate biological and biomedical discovery through data integration.

Our approach is to promote the sharing of large data sets generated by high throughput functional genomics technologies. Historically, MGED began with a focus on microarrays and gene expression data. However, the scope of MGED now includes data generated using any technology when applied to genome-scale studies of gene expression, binding, modification and other related applications.

Members of MGED work to establish standards for data quality, management, annotation and exchange; facilitate the creation of tools that leverage these standards; and work with other standards organizations and promoting the sharing of high quality, well annotated data within the life sciences and biomedical communities.

Page 6: The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.

MGED Standards• What information is needed for a microarray

experiment?– MIAME: Minimal Information About a Microarray

Experiment. Brazma et al., Nature Genetics 2001

• How do you “code up” microarray data?– MAGE-OM: MicroArray Gene Expression Object Model.

Spellman et al., Genome Biology 2002 – MAGE-TAB Rayner et al., BMC Bioinformatics 2006

• What words do you use to describe a microarray experiment?– MO: MGED Ontology. Whetzel et al. Bioinformatics 2006

Page 7: The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.

New MGED-Related Activities• The MGED Society mission includes facilitating deposition of functional

genomics datasets (e.g. microarray studies) in public archives.  In addition to addressing what and how data gets deposited, we are very much concerned with seeing that authors adhere to journal requirements for data deposition. Unfortunately, the requirement for data deposition is not being sufficiently met and important datasets are not accessible (see for example Ochsner et al Nature Methods 2008).

• Therefore, we ask that investigators seeking microarray and UHTS functional genomics datasets from studies published in journals requiring deposition contact us if they are unable to get them. We will then contact the authors on your behalf and inform the journal where the study was published. We will document the results on the MGED web site to assist others seeking the same dataset and to aid reviewers of related publications and grants.

• http://www.mged.org/wiki/index.php/Published_Dataset_Availability

Page 8: The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.

New MGED-Related Activities

• UHTS submission to repositories– Both ArrayExpress and NCBI GEO accept functional genomic experiment

submissions generated by ultra-high-throughput sequencing (UHTS) technologies. ArrayExpress and GEO have entered into a metadata exchange agreement, meaning that UHTS sequence experiments will appear in both databases regardless of where they were submitted. This complements the exchange of underlying raw data between the short read archives, SRA and ERA. Raw sequencing data submitted to ArrayExpress or GEO will be sent to ERA or SRA respectively. You do not need to submit to the sequence repositories separately.

– See Helen Parkinson (ArrayExpress) and Tanya Barrett (GEO) for details.

Page 9: The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.

New MGED-Related Activities

• UHTS Quality Working Group– Marc Salit (NIST) – Best practices for RNA-Seq

• Illumina (Solexa)• Ambion (ABI SOLID)

Page 10: The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.

New Directions for MGED Standards• What information is needed for a UHTS experiment?

– MINSEQE: Minimal Information about a high throughput SEQuencing Experiment.

– http://www.mged.org/minseqe/• How do you annotate microarray and gene expression

data?– Annotare: Tool to create MAGE-TAB. – http://code.google.com/p/annotare/

• What words do you use to describe an investigation?– OBI: Ontology for Biomedical Investigations. – http://obi-ontology.org/

Page 11: The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.

A draft proposal for the required Minimum Information about a high-throughput Nucleotide

SeQuencing Experiment – MINSEQE (April 1, 2008)

• The description of the biological system and the particular states that are studied

• The sequence read data for each assay

• The 'final' processed (or summary) data for the set of assays in the study

• The experiment design including sample data relationships

• General information about the experiment

• Essential experimental and data processing protocols

Page 12: The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.

Annotare - An open source standalone MAGE-TAB editor

Page 13: The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.

MAGE-TAB Format

What’s MAGE-TAB?

• MAGE-TAB is a simple spreadsheet view which has two files    IDF - describing the experiment design, contact details, variables and protocols

•     SDRF - a spreadsheet with columns that describe samples, annotations, protocol references, hybridizations and data

• Linked data files, e.g. CEL files, these are referenced by the SDRF

• For single channel data one row in the SDRF = 1 hybridization, for two channel data one row = 1 channel

• MAGE-TAB can also be used to annotate Next Gen Sequencing data

Where can I get MAGE-TAB from?

• ~10,000 MAGE-TAB files are available for download from ArrayExpress (GEO derived and ArrayExpress data

• caArray also provides MAGE-TAB files for download.

Page 14: The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.

IDF file for E-TABM-34

Page 15: The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.

SDRF file for E-TABM-34

Page 16: The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.

Annotare

Annotare - an open source MAGE-TAB Editor

• Annotare is an annotation tool for high throughput gene expression experiments in MAGE-TAB format. Biologists can describe their investigations with the investigators’ contact details, experimental design, protocols that were employed, references to publications, details of biological samples, arrays, and experimental data produced in the investigation.

Annotare Features

• Intuitive graphical user interface forms for editing• Ontology support, an inbuilt ontology and web services connectivity to

bioportal• Searchable standard templates• Design wizard • Validation module for syntactic and semantic checking• Mac and Windows Support

Page 17: The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.

Annotare Features - Templates

Search, choose, and save templates

Page 18: The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.

Annotare Features – Design Wizard

Define species, common array designs and protocols can be pre-loaded

Page 19: The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.

Ontology Support

Autcomplete using preloaded EFO, or ontology term lookup at BioPortal

Page 20: The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.

Excel like, or form driven annotation

Page 21: The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.

Validation

Page 22: The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.

Supporting Applications

• caArray upload

• ArrayExpress submissions

• SOFT-MAGE-TAB converter (for GEO)

• Similarity Search – AnnotCompute– /www.cbil.upenn.edu/RAD/php/annotCompute/

• MeV data upload

• MAGE-TAB Bioconductor Import

• Generic limpopo parser for MAGE-TAB

Page 23: The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.

Links

• Code and documentation - code.google.com/p/annotare

• Limpopo parser– sourceforge.net/projects/limpopo/

Page 24: The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.

Annotare Acknowledgements

• Annotare: Catherine A. Ball, Tony Burdett, Junmin Liu, Emma K. Hastings, Michael Miller, Sarita Nair, Helen Parkinson, Ravi Shankar, Rashmi Srinivasa, Joseph White

NHGRI grant P41 HG003619

Page 25: The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.

OBI – Ontology for Biomedical Investigations

• MGED is one of many communities contributing to OBI

• Whereas the MGED Ontology is primarily a controlled vocabulary for use with MAGE, OBI is a well-founded ontology with logical definitions and restrictions to be used for multiple purposes (e.g., database models, text mining, file annotation)

Page 26: The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.

OBI and IAO (Information Artifact Ontology) classes are shown in blue. Classes imported from other external ontologies are shown in red. Some example subclasses, such as PCR product and cell culture are included to illustrate the use of the class processed material.

Partial high level structure of OBI classes

Page 27: The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.

OBI – Ontology for Biomedical Investigations

• OBI intends to be part of the OBO Foundry

• Interoperable with Gene Ontology, CheBI, Phenotypic qualities (PATO), Cell Type (CL)…

• Learn more at – http://purl.obolibrary.org/obo/obi

• OBI is available through browsers like the NCBO BioPortal

Page 28: The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.

Measuring the glucose concentration in blood

From The OBI Consortium, The Ontology for Biomedical Investigations, under revision

Page 29: The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.

An OBI representation of a MAGE-TAB file

Focus on where MO terms were used in E-TABM-34

Page 30: The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.

Utility of these standards for CTSA?

Integration requires standards: Use Annotare to generate MAGE-TAB Use OBI when possible for source of controlled terms, modeling protocols, assays, investigations

Your data repository Some other public repository

MAGE-TAB for microarray, UHTS dataOBI for describing biomedical (including clinical) data

Page 31: The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.

For more information see http://www.mged.org

Page 32: The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.
Page 33: The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.

For more information see http://www.mged.org

more about standards at http://biostandards.info/follow us on twitter @MGED_Society

Page 34: The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.

MGED Meetings

• It’s about the science!

• Keeping up with the latest advances

• Making connections with potential collaborators

Page 35: The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.
Page 36: The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.

Thank you!

Questions?


Recommended