Date post: | 14-Apr-2017 |
Category: |
Data & Analytics |
Upload: | jyotikhadake |
View: | 165 times |
Download: | 0 times |
DATA LIFE CYCLE MICROBES: CONSTRAINTS, IMMEDIATE AND LONG TERM
26TH OCTOBEREMBL-ABR
Challenges for data
• Management• Annotation• Analysis• Storage• Sharing
Long term planning and maintenance
• Project Funding and continuity of data availability
• Funders/ Institutional requirements : data from public resources must be in public domain.
• Availability of storage facilities• Availability of analysis facilities
DMP
Data sharing: going beyond required• Granting authorities• Journal requirements
Facilitation by:• Institutional sharing• Availability of repositories/Archives : GIT hub, EBI
Repositories, NCBI repositories, Institutional and National Data archives
• Analysis, Annotaion and Advertisement of resource• Data publication
Analysis workflow metagenomics*
Unirule
Interpro2go
Resource list
• Submission: GEO, SRA, Array express, ENA/Genbank/DDBJ; PRIDE, Metabolights
• Annotation: UniProt, GO, Interpro, Reactome, PDB, Interactome …
• Visualisation: Ensembl, Networks, Structures …
• External annotaions• Comparative genomics and Metagenomics
Importance of meta-data• Data valuation by addition of metadata• Incorrect/inadequate meta data affects Analysis,
Rediscovery• No meta-data makes set impossible to find, and of
no value. Tagging helps.• Student exercise –Soil/Coral – before they start a
submission• If you use resources enrich them for use by yourself
or others through submissions and annotation
Ontologies and standards
Interoperatibility searching and reasoning• Gene Ontology – are more terms needed?• EnvO Environment ontology Biome, Condition
and Material• EFO – experimental eta-data, more terms may
be needed• FOAM – Functional ontology Assignments for
Metagenomes• OBIB ontology ? BioBanking
Resource catalogue
BioSamples – deposit and reference study details for ‘Omics expts
OMICs expts –access using OMICS Discovery Index (http://www.omicsdi.org)
Eg. EnvO
Resource Metagenomics-Rapid Annotations using Subsystems Technology (MG-RAST) Community Cyberinfrastructure for Advanced Microbial Ecology Research and
Analysis (CAMERA) Integrated Microbial Genomes and Metagenomes (IMG/M)
Annotation transfer
Limited biochemical resources, limited number of manual curators to transfer data into databases (UniProtKB/Swiss-Prot, GO)Annotation transfer – Gene OntologyInterPro2GO EC2GOUniProt-keywords2GO Ensembl ComparaUniProt-subcellular locations2GO HAMAP2GOUniPathway2GO
Annotation transfer – TrEMBL annotationUniProt UniRules
All based on InterPro family/domain matches
Annotation transfer - InterPro2GO
InterPro
Annotation transfer - UniProt
Proprietary data
• Any data generation funded by a commercial entity may have data restrictions associated with it.
• Any data generation involving proprietary organisms/environs may have data restrictions on them.
• Data withdrawal – obsolete vs destroy
Data life-cycle
Sequence/assembly/Annotation/RNA seq
Public domaindata deposition
Update annotation
In-house data resource
Data sharing
• E-notebooks, scripts – use GitHub? Curation, removal of dead ends a real issue
• Will they work for complex multi-step processes
• Shell scripts enough? Genome-specific.• Best practice recording examples would be
appreciated• Consider Natural History scratchpads
Summary
• Identify potential issues early on in the project life-cycle – spending time identifying issues and planning how to address them
• Prepare to data share as early as possible – what information would you like to see if your were the data user.
• Think beyond the life-time of the grant, what are your long term plans for the sustainability of the data
• If issues with access do feed back. If not primary submission this should be sorted.