+ All Categories
Home > Documents > RDA Wheat Data Interoperability Cookbook and last developments 9 th March 2015, San Diego.

RDA Wheat Data Interoperability Cookbook and last developments 9 th March 2015, San Diego.

Date post: 27-Dec-2015
Category:
Upload: solomon-boone
View: 215 times
Download: 0 times
Share this document with a friend
Popular Tags:
14
RDA Wheat Data Interoperability Cookbook and last developments 9 th March 2015, San Diego
Transcript

RDA Wheat Data Interoperability Cookbook and last developments

9th March 2015, San Diego

2The WDI working group in brief

Endorsed by RDA in March 2014 Members: ~=30 members and 15 active members, Wheat

scientists, data and metadata technologists The goal: contribute to the improvement of Wheat related

data interoperability by Building a common interoperability framework (metadata, data formats and

vocabularies) Providing guidelines for describing, representing and linking Wheat related

data

3

Deliverables A report of the survey of existing standards A cookbook intended for the Wheat data managers community, which

provides them with guidelines on what data formats, metadata, vocabularies and ontologies they should use to describe, represent and link different types of Wheat data.

A library of linked vocabularies and ontologies in machine readable formats with respect to the Linked Data standards.

A prototype which showcases the gain of interoperability

Initial plans

4Where we are

Wheat related standards survey and workshop

6Data type Data formats currently used Recommendations

Standardized Tool specific Non standardized

SNPs VCF BAM/SAM, BED, VARSCAN, VEP

VCF files generated by using the survey sequences of IWGSC + metadata about VCF files to enrich the information about the SNPs.

genome annotations

Genbank Flat File, General Feature Format (GFF), EMBL

GFF 3 + specifications with regard the description of specific columns

Germplasms MPCD, ABCD, Darwin Core, Darwin Core Germplasm

Grin Global tabulated MPCD

Gene expression

Many format standards laid out by repositories such as NCBI (GEO) and EBI Array Express

Existing format standards laid out by the repositories such as NCBI (GEO) and EBI Array Express + ENA

Physical maps GFF Cmap, fpc GFF3

Genetic maps Cmap, gnpmap GFF3 (to be confirmed)

Phenotypes Drops, ped, isa-tab, ephesis

tabulated Isa-tab

7Examples of use cases

Title Searching for germplasm with specific traits

Description Example of searching for germplasm with specific traits - tagged with ontology terms?

Data types GermplasmPhenotype

Challenges ● Metadata very important ~ standardized format● Association of genes to traits, linked to germplasm, marker information● Need for quality controls- how confident are you of the data source?● Provenance of the germplasm- pedigree, ownership, ● Standard system for tracking germplasm, names

Title Identification of wheat genes that control root growth

Description Requires: Annotated genes (Gene Ontology, PFam, and other functional annotation)

Data types Genomic annotations? - Gene location ? (IWGS-SS ID or MIPS HCS link)

Challenges Mapping between wheat genes and orthologs from other species (deduce function by seq. similarity); Access to RNASeq data (genes that are not expressed in roots may be irrelevant) ; mapping of wheat genes and information on their function based on literature

Title Query on trial data associated with varieties

Data types Phenotypic data, GIS data, (wheat economy/production data)

Description To search wheat varieties with distribution maps, production figures, performances in wheat mega environments, associated projects worldwide plus layers of climatic data on specific wheat production areas and disease prevention information.

Challenges Phenotypic data should be linked to GIS data. Using keywords or ontology terms a system or a tool should be able to pull out such information from different websites/systems developed by wheat community.

8

Wheat related ontologies and vocabularies survey

10

Assess the level of visibility and interoperability of Wheat related vocabularies and ontologies Is the vocabulary/ontology updated regularly? What license and/or copyright is used? Is the vocabulary/ontology part of any ontology communities or listing

services? Is the vocabulary/ontology used or implemented in any database/repository? Does the vocabulary/ontology interlink and/or map to other vocabularies and

ontologies? Does the vocabulary/ontology

Identify the domain covered by the ontologies and vocabularies Refine the cookbook Collect more interoperability use cases

Collect some technical details

The objectives of the survey

11The objectives of the survey

The Wheat related BioPortal allows one to search for terms across multiple ontologies, browse mappings between terms in different ontologies, receive recommendations on which ontologies are most relevant for a corpus, annotate text with terms from ontologies

13

Metadata (harmonization, minimal metadata sets) Mappings Next workshop (summer 2015)

Review and complete the recommendations Refine and complete the guidelines and the best practices

Finalize the repository of Wheat related vocabularies Implement the prototype

Next steps

14Thanks!


Recommended