Data submission services of EMBL Australia Bioinformatics Resource (EMBL-ABR) · 2019-04-30 ·...

Post on 08-Jun-2020

1 views 0 download

transcript

ActivityReport-June2018

DatasubmissionservicesofEMBLAustraliaBioinformaticsResource(EMBL-ABR)

OverviewSince January 2016, theQFAB@QCIF teamprovidesdata submission servicesonbehalf ofEMBL-ABR.Theseservices refer to theguidanceandsupportprovidedtohelpBioPlatformAustralia (BPA) and Australian researchers with the process of curating, formatting andmanagingresearchdatafortransfertoexistinginternationaldatarepositories,whereitwillbepubliclyaccessibleforreuse.

ActivityReport-June2018

page2of6

EMBL-ABR’sDataSubmissionServiceTheQFABteamfromtheEMBL-ABR:QCIFNodeusesarangeofscriptsandstandardoperatingprocedurestosupportthesubmissionofAustraliansequence-baseddatatotheEBIEuropeanNucleotideArchive(ENA)andtheNCBISequenceReadArchive(SRA).Theserviceincludes:

• managementofENAandSRAdatasubmissionaccountsaccessiblebyresearchers• automationofuploadprocessessavingresearcher’stime• optimisationofdatatransferprocessestoensuredataintegrityandreducetransfer

failure• provision of staging infrastructure to facilitate submissions from the researcher’s

perspective• verificationandcollationofrequiredmetadatapriortosubmission• submissionofresearcherdatatoENAandSRA• submissionofselectedBioplatformsAustraliadata• maintaining boutique data submission tools such as Tox|Note (for venom-gland

transcriptomedata submission and toxin card creationonArachnoserver) and thesystemdevelopedfortheBeatsonGroupandcollaborators

• ahelpdeskforsupport.

Testimonials–2018EMBL-ABRsequencesubmissionservicewasanimmensehelpforthesubmissionofdataforourmanuscript.Giventhenumberofsamplesthatweneededtosubmit,havingsomehelpongetting itup toEBI savedusvaluable time.Given thesharedaccess toNeCTARcomputingresourceswewereabletotransferdatabetweeninstituteseasilyaswell.BothmyselfandothermembersoftheUniversityofAdelaideBioinformaticsHubhavealreadyrecommended students and PIs to make use of the service when submitting data forpublication.JimmyBreenRobinsonResearchInstitute,CoreFacilityLeader(Bioinformatics)atUniversityofAdelaide

DearDominiqueandteam,ThankyouSOmuchforyourworkonthistodate–ithasbeenbrilliant.RebeccaJohnsonAustralianMuseumResearchInstitute(AMRI)

IhadaverygoodexperiencewithusingtheEMBL-ABRsequencesubmissionservice.NickandGarethmadeitveryeasytocollateandsubmitourdatasetstotheENAdatabaseaheadofourpublicationinGenetics.Iwouldhighlyrecommendthemtoanyresearcherthatdealswitharchivingandsubmittinglargedatasets.DavidSchlipaliusSchoolofBiologicalSciences,TheUniversityofQueensland

ActivityReport-June2018

page3of6

RecentRepresentativePublicationsGlobalDNAMethylationPatternsCanPlayaRoleinDefiningTerroirinGrapevine(Vitisviniferacv.Shiraz)HXie,MKonate,NSai,KGTesfamicael,TCavagnaro…-Frontiersinplantscience,2017Variant linkage analysis using de novo transcriptome sequencing identifies a conservedphosphineresistancegeneininsects.Schlipalius, David I., Tuck, Andrew G., Jagadeesan, Rajeswaran, Nguyen, Tam, Kaur,Ramandeep, Subramanian, Sabtharishi, Barrero,Roberto,Nayak,Manoj andEbert, PaulR.(2018).Genetics209(1)281-290.ArachnoServer3.0:anonlineresourceforautomateddiscovery,analysisandannotationofspidertoxins.Pineda SS, Chaumeil PA, KunertA, KaasQ, ThangMWC, Le L,NuhnM,HerzigV, SaezNJ,Cristofori-ArmstrongB,AnangiR,SenffS,GorseD,KingGF.Bioinformatics.2018Mar15;34(6):1074-1076

Submissionstatistics

ActivityReport-June2018

page4of6

QFAB@QCIFteamVariousmembers of theQFAB team are involved in the provision of the data submissionservices:NickRhodes

• Contactpersonforallusersandstakeholders,includingENA• Processdesignandimprovement• Managementofuseraccounts• QualityControlofmetadatapriorofsubmission• Submissionofdata• Technicalsupport

MikeThangandThomCuddihy

• ManipulationofBAMfilesusingSAMtools• Processimplementation• Submissionofdata

JeffChristiansen

• Broadeningoftherangeofsupportedsubmissions• Identificationofmetadatarequirements• Investigationofmetadatamanagementsystems

Development and improvement of processes for datasubmissionTheQFAB@QCIFhasimprovetheefficiencyandeaseofuseofthedatasubmissionprocessby

• Automatingsomemanualsteps• DeployingandmaintainingadedicatedVMonQRIScloudfordatasubmission

o Linuxaccountsforuserstouploaddatao User“handholding”,asrequiredo Volumestorageallocatedasrequiredo NFSaccesstotheBPAcollectionsonQRIScloudRDSstorageo AsperaSecureCopyclient

ActivityReport-June2018

page5of6

SupportingBPAwiththesubmissionofdatatoENATheQFAB@QCIFissupportingsubmissionsofdataforthefollowingBPAprojects:

• BASEproject• GreatBarrierReefproject• MarineMicrobesproject(notyetstarted)

SupportingAustralianresearcherswiththesubmissionofdatatoENAandSRATheQFAB@QCIFteamissupportingdatasubmissionactivitiesfortheAustraliancommunity:

• Bacteriagenomes-ScottBeatson,UQ• Spiderandothertoxins–GlennKing,UQ• TasmanianDevil–BelindaWright,UniversityofSydney• Sponge–DegnanLab,UQ• Porphyromonasgingivalis-HelenMitchell,UoM• Streptococcuspneumoniae-BioinformaticsHub,UniversityofAdelaide• IndianMynagenomeassemblyassessment(AustralianMuseumResearchInstitute,

AMRI)• MSGBSsamplesfromBarossagrapes-BioinformaticsHub,UniversityofAdelaide• MSGBS samples, salt-induced alterations of DNAmethylation in barley – Stephen

Pederson,BioinformaticsHub,UniversityofAdelaide

Wehaverecentlyadoptedamoreproactiveapproachtopromotingtheserviceincludinghigh-profilelinksontheEMBL-ABRwebpage.WeanticipatethatmoreresearchersacrossAustraliawillbeinterestedinthedatasubmissionservicesasvisibilityincreases.

Maintainingdevelopedboutiquedatasubmissiontools

Tox|Note

In2014/2015Tox|Note,atoxinanalysisworkflow,wasdevelopedincollaborationwithGlennKing’sGroup(UQ),EMBL-ABR(formerlyBRAEMBL)andQFABBioinformaticstosignificantlyfast track theanalysisof venom-gland transcriptomesgeneratedby largeNextGeneration(NG)sequencingprojectsandallowaneasyandsimplesubmissionofthefindingsviaEMBL-ABRasdatabrokertoENA/UniProt.

Forthispurpose,EMBL-ABRandQFABBioinformaticsworkedcloselytogethertointegrateadatasubmissionmoduleintoTox|Noteallowingresearcherstosubmittheirsequenceswiththe requiredmetadata,obtainaccessionnumbersandautomatically create toxin cardsonArachnoServer,aglobalandpublicrepositoryforspidertoxinandstructureresearchavailableathttp://www.arachnoserver.org.

ActivityReport-June2018

page6of6

SRAuploadworkflow

AnSRAuploadworkflowwascreatedforsubmissionsofbacterialgenomestotheSRAfromtheBeatsonGroup.Thetoolintegratesauthentication,proxyhandling,messagingprotocols(Slack)andrecursivefilehandling,builtontheLinux-standardvsftpd(VerySecureFileTransferProtocolDaemon).

TheQFABhascontinuedtosupporttheseboutiquedatasubmissionservices:

• MaintenanceoftheTox|Noteworkflow

• SubmissionofnewlyidentifiedtoxinsbyTox|NotetoENAandUniProt

• Submission of bacterial genomes from the Beatson Group and maintenance ofbespokeuploadtooldeployedforthispurpose

SubmissionrequesttrackingspreadsheetData submission requests are tracked and shared with EMBL-ABR Hub through a Googlespreadsheetavailableat:https://docs.google.com/spreadsheets/d/1WtGL7IQf-a09kEVH79yqvC09HnTGT4KuC74_hZEiF3w/edit?usp=sharingEach submission request isdifferentwith some requestsbeing forone sampleonlywhilstothercouldbeforhundredsoreventhousandsofsamples.Assuch,theamountofsupportrequired for each request in the tracking spreadsheet varies vastly. We believe that ourinteractive,personalapproachtoclientrequirementsisfundamentaltoitsappealtousers.Experiencedwet-labscientistsmaylackthetimeorskillstonegotiatethesubmissionprocess,indeed thedelays observedbetween thedates of sequencing runs andwhenwe are firstapproachedindicatesthereisanaccumulatedback-logofsubmissions.

NickRhodes&DominiqueGorse

QFABBioinformatics,QCIF

BIOINFORMATICS|BIOSTATISTICS|BIODATA