Post on 18-Jan-2018
description
transcript
CyVerse-enabled NCBI Sequence Read Archive (SRA) Submission Pipeline
-A Part of the CyVerse Data Commons Effort
Scientific Project Management In the CyVerse Data CommonsSupport collaboration, reproducibility, data publication, discovery, and reuse
Prepare data/metadata for publication – Request permanent identifiers – Submit to the CyVerse Data Commons Repository - Submit to canonical repositories
Staging Area
Within the CyVerse Data Store - Public, static, searchable, discoverable – Disseminate published data with permanent identifiers - Browse, search, discover, and reuse datasets -
Data Commons Repository
Planned 4th DE tab - View/Edit custom or standardized metadata – Track analysis history – Enhanced tools to share data and analyses with collaborators over a project lifecycle
Projects Interface
Data Commons services are in development
SRA Submission Overview
• SRA Home Page – safe to assume familiarity?
• SRA Submission Quick Start Guide– Create BioProject and BioSample(s)
– SRA Submissions = compressed seq files and metadata for ‘Experiments’ and ‘Runs’ associated with BioProject and BioSample(s)
Submitting to the NCBI Sequence Read Archive can be Time Consuming
• Submission Package = compressed seq files and metadata for associated BioProject and BioSample(s) and sequencing libraries
• Pain Points: Independent BioProject and BioSample creation - Data compression - Checksum generation - Copy paste errors - Correct metadata templates and formats - Uploads slow and or interrupted - Error correction
• Worked with SRA to create interoperable submission workflow in the Discovery Environment
CyVerse Users Asked for Help
• Explored browser-based and bulk submissions
• Developed pipeline in collaboration with SRA staff
• Set up Aspera Connect on CyVerse systems and linked to SRA test servers
• Built submission pipeline in the Discovery Environment
CyVerse-Enabled SRA Submissions Remove Roadblocks1. Upload Data into CyVerse Discovery Environment
– Efficient tools already in place within the DE– Batch compress data (if not already compressed before upload)
2. Create and organize submission package– Eliminate need for independent BioProject and BioSample creation and checksums
3. Enter metadata from templates and save package metadata to file– Eliminate metadata burden with dropdown menus, instructions, and metadata copying– CyVerse system generates single metadata XML file for submission
4. Submit package with metadata with CyVerse SRA submission App– Eliminate data transfer problems with CyVerse transfers via Aspera Connect
5. Receive submission notification from SRA– SRA validates submissions and communicates to users as usual, CyVerse is not ‘in the way’
6. If needed, correct package errors and resubmit– Correct errors and resubmit from CyVerse, without having to recreate submission package
CyVerse SRA Submission Workflow
• BioProject Creation App– For creating a new BioProject with submission
• BioProject Update App– For updating an existing BioProject for submission
• Submission Report Retrieval App• Submission Tutorial– Example Data in the DE at: /iplant/home/shared/iplantcollaborative/example_data/SRA_submission
Demo – SRA Submission Example Package