Date post: | 09-Jan-2017 |
Category: |
Science |
Upload: | manuel-corpas |
View: | 193 times |
Download: | 4 times |
We are always looking for data
Finding & Accessing
Human Genomic Datasets
CRUK, 7th November 2016
Tweets welcome #CamFindData@repositiveio
Outline of the day
- Data sources and data access - Case study: University of Cambridge- Coffee break- Introduction to Repositive- Hands-on session: searching for data- Round up and closure
On-line tools used during the workshop
To ask questions during the presentation and answer questions:
go to slido.com
enter event code: 7315
We are always looking for data
Finding & Accessing
Human Genomic Datasets
CRUK, 7th November 2016
Tweets welcome #CamFindData@repositiveio
• 2001:FirstHumanGenomeSequence• 2005:PersonalGenomeProject• 2008:UK10K• 2013:UK100KProject• 2015:1MPrecisionMedicineUS• 2016:AstraZeneca–HLI2M
• Manyothernationalandinternationalprojects
Genome Technology Evolution
•Consensusamongresearchers,clinicians,politicians&thepublicthatgenomicswilltransformbiomedicalresearch,healthcareandlifestylechoices(StephanBeck,UCL)
OPPORTUNITY
Data should be made available
• Requiredbyfunders• Cannotpublishunlessaccessionnumbergiven
• Specialised• ENA• EGA• dbGaP• dbSNP…
• Generalist• Dryad• figshare
Public Repositories
• OpenAccess• Eg.PGP,CC0• BermudaAccord
• Managed(RestrictedorControlledAccess)• DataAccessCommittee• Noeffectiveagreement(policyvacuum)
• GlobalAllianceforGenomics&Health• enablecompatible,readilyaccessible,andscalableapproachesforsharing
GOVERNANCE Models
Open vs Managed Access
OpenAccess
75,000,000permonth
ManagedAccess
150permonth
500,000 fold difference (Stephan Beck, UCL)
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Large amounts of data, but not accessible
≈.5 PB OpenAccess
80+ PB
Sequenced
Genome data available in public
repos
Exponential growth rate
Under-utilised datahashuge potentialfor
medicalresearch
Access to Managed Data
Benefits:• Strictgovernance• Individualsareprotected• Reviewofconsent• Applicantsignsforfullresponsibilityforgovernance
Disadvantages:• Nocontrolofdataonceaccessisgiven
• Highbarrierforaccess–toohigh?
Often a long process
Bottlenecks: • Finding relevant and usable
data• Getting authorisation to
access data• Formatting data• Storing and moving data
We studied the problem with qualitative interviews followed by a survey of researchers in
human genetics
T. A. van Schaik et alThe need to redefine genomic data sharing: a focus on data accessibility, Applied & Translational Genomics, 2014 http://tinyurl.com/schaik-dnadigest
NIH / eRA Commons login
No
Yes
Organisation registered with eRA
Organisation has DUNS number
No
NoWrite research proposal
Yes+ 2-3 days
+ 1-2 weeks
+ 1 week
Yes
Submit proposal
+ 1-2 days
Access grantedFind/Download/Decrypt data
+ 1-4 weeks
Science…
+ 1-2 days
PRO Tip: If you use human genomic data, apply for the GRU datasets in dbGaP, one application – access to all the GRU datasets.
dbGaP application process
Blog Post:http://blog.repositive.io/how-to-successfully-apply-for-access-to-dbgap/
Sanger eDAM Account
No
Write research proposal
+ 1 hourYes
Submit proposal
+ 1-2 days
Access grantedFind/Download/Decrypt data
+ 2-7 days
Science…
+ 1-2 days
EGA application process
Blog Post:http://blog.repositive.io/how-to-successfully-apply-for-access-to-ega/
• Findingspecificrelevantgenomicdataforresearchcantakeup to six months foranuntrainedresearcherwithoutdedicatedtools
• Application&responsetimefordata access committees can vary widelydependingon• thetypeofdataset• consentregulationsofthestudy
• =>thereisnoconsensusforthe‘contracts’betweeneachdataset
FACTS
Researchers often choose to not access data at all
WHY should we bother?
• Validateexistingstudies• Avoidunnecessaryduplication• Comparetonewstudies• Enhancenewdatasets
Why datasets are useful
Case studies
Raquel,PhDStudent,London,UK.
Researchinggenesassociatedwithrareeyedisorders.
Problems:- Doesn’tknowwheretolookfordata.- Doesn'tknowifdataevenexists.
“I gave up on finding the data - it was very time consuming and not proving fruitful – so I started focusing more on generating my own data.”
Case studies
Mahantesh,AcademicResearcher,Taipei,Taiwan.
Studyingpharmacogenomicsincardiovascularepidemiology.
Problems:- Needslotsofdata.- Knowsitexistsbutstruggleswithgettingaccesstoit.
“Often it’s very hard to get the required number of cases and controls to carry out research in public health and epidemiology.”
Case studies
Jana,CompanyBiocurator,Zurich,Switzerland.
BiocuratingmicroarrayandRNA-Seqdata.
Problems:- Needslotsofdata.- Lotsofdataouttherebuthardtofilterdownto‘useful/relevant’data.
“Many repositories don’t list the metadata details I need to know if a dataset is useful to me, I can waste a lot of time searching.”
How many data sources?
How many sources of human genomics data do you know
about?
Data sources across the globeGEOlocationof278datasourcesanalysed.
Found by tracking IP address of the source.
Theseinclude:
PublicRepositories
Universities
Companies
BioBanks
Researchconsortiums
Data source content
Assay Types
Dedicated to…
DATA is fragmented
Hundreds of data sources…buttheyaren’teasytofind!
http://tinyurl.com/plos-biology-repositiveFirst 30 data sources listed here:
Jan-15 Mar-15 Jun-15 Sep-15 Dec-15 Mar-16 Jun-160
50
100
150
200
250
300
1025 33 35
102
174
239
Cambridge specific Case Study
• PostdoctoralresearcheratUniversityofCambridgeMedicalSchool
• WorkingongeneticinheritanceandCancer• UsingNGSdataandbioinformatics
• Aftersearchingfordataonlineshedecidedtoapplyfor:• 2dbGaPdatasets• 3EGAdatasets
Cambridge specific Case Study
Blog Post:Pending… will be on http://blog.repositive.io/
The Research Operations Office -willhelpyouwiththecontracts(DataTransferAgreements-DTAs)andsignatures.
• HasadesignatedindividualwhoprocessesalldbGaPapplicationsastheyallabidebyNIHlegalrestrictionsandregulationsabouthowtohandlethedataoncegrantedaccess
• ForEGAapplications,eachDTAmustbeprocessedseparatelybecausethereisnoconsensusforthe‘contracts’betweeneachdataset.
Cambridge specific Case Study
Blog Post:Pending… will be on http://blog.repositive.io/
The nominated IT director -willbespecifictoyourdepartment.
• TheywillneedtoconfirmyoucansupporttherequirementsoftheDTA.
• IftheheadofyourdepartmentalITisnothappytosign–theheadofITfortheUniversitywillbeabletosignitoff.
Cambridge specific Case Study
Blog Post:Pending… will be on http://blog.repositive.io/
Top Tips:
• Thinkaboutyourstoragespace!
• Thinkaboutwhatsortofanalysisandprocessingyouaregoingtodowiththedataonceyoudohaveit.Aftersuchalongprocess,theapprovalcouldbetooquick.
• Understandwhatyouneedbeforeyoustarttheapplicationprocess!
• Youmayhaveaccessforalimitedperiod
Cambridge specific Case Study
COFFEE BREAK
Backin10’
@repositiveio
1-click to human genomic data access
to make finding data as easy as finding a book on Amazon, book a hotel on Expedia!
Simpler workflowfor data access
Our expertise is data search platforms
Discoverandaccess
Search,seerelatedresults
Findcolleagues&theirdata interests
Co-annotatedata&communityfeedback
We are enabling best practices
MAKE DATA DISCOVERABLE
SIMPLIFY WORKFLOWS
CONTRIBUTE TOCOMMUNITY
DNAdigest and Repositive – Connecting the world of genomic datahttp://www.tinyurl.com/plos-biology-repositive
Connecting the world of genomic data
1.Formgroupsof2-3people2.Selectaleader&aspokeperson3.Choose1data theme youareinterestedin
1. E.g,coloncancer,prostatecancer,breastcancer
4.Signupathttps://discover.repositive.io/5.SearchtheRepositivewithselectedtheme
Hands on
Team presentation: 2 minutes
1. Introduction What data did you try to find and why?Have you tried to search for this data before?
2. MethodsThe 5 main steps you took on Repositive to try and find this data.
3. ResultsDid you find the data on Repositive?What challenges did you encounter?
4. ConclusionSum up your experience in 1 sentence.
1 2 3 4 5
Feedback on the workshop
Bugs and feedback to: Charlotte at Repositive.io
Please leave your feedback on the workshop:
http://tinyurl.com/feedback280916
http://discover.repositive.io @repositive