Post on 08-Jan-2017
transcript
Towards Open Researchpractices, experiences, barriers and opportunities
3rd Research Data Network St Andrews, 30 November 2016
Veerle Van den Eynden Gareth Knight
UK Data ServiceUniversity of Essex
London School of Hygiene & Tropical Medicine
Our research• Researchers funded by Wellcome Trust and ESRC: biomedical, clinical,
population health, humanities, social sciences Current attitudes and practices related to sharing of:
• Publications• Data• Code
Barriers that inhibit or prevent researchers from sharing Identification of action that funders can take to encourage good
practice and mitigate issues
• Survey (N=583 + 259), focus groups (N=22)
Van den Eynden, Veerle et al. (2016) Towards Open Research: Practices, experiences, barriers and Opportunities. Wellcome Trust. https://dx.doi.org/10.6084/m9.figshare.4055448
Data sharing practices
• 95% of respondents generate research data• 51 / 55 % of these made research data available in last 5 years• 4 / 2 datasets on average: full dataset or subset, e.g. with paper• sharing increases with career length• sharing varies by discipline
• 77% reuse existing data for: background, validation, methodology development & new analysis
Reasons to share data (Wellcome)My funder requires me to share my data(N=273)
Journal expects data underpinning findings to be accessible(N=273)
My research community expects data sharing(N=274)
It is good research practice to share research data(N=277)
It enables collaboration and contribution by other researchers(N=274)
It has public health benefits, e.g. disease outbreaks(N=265)
Ability to respond rapidly to public health emergencies(N=263)
Ethical obligation towards research participants to maximize benefits for society(N=266)
Contributes to academic credentials(N=273)
Enables validation and /or replication of my research(N=275)
Improved visibility for my research(N=273)
I can get credit and more citations by sharing data(N=267)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Not at all important Slightly important Moderately important Very important Extremely important
Reasons to share data (ESRC)My funder requires me to share my data(N=131)
Journal expects data to be accessible(N=132)
My research community expects data sharing(N=131)
It is good research practice to share research data(N=133)
Collaboration and contribution by other researchers(N=131)
It has public health benefits, e.g. disease outbreaks(N=125)
Ability to respond rapidly to public health emergencies(N=122)
Ethical obligation/Maximize benefits for society(N=128)
Contributes to academic credentials(N=128)
Enables validation and /or replication of my research(N=129)
Improved visibility for my research(N=128)
I can get credit and more citations by sharing data(N=127)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Not at all important Slightly important Moderately important Very important Extremely important
Benefits from data sharing: collaborations, higher citation ratesMost no direct benefits; but also no bad experiences
Barriers to data sharing (Wellcome)I may lose publication opportunities if I share data(N=517)
Others may misuse or misinterpret my data(N=519)
I have insufficient skills to prepare the data(N=505)
It requires time/effort to prepare my data for deposit(N=520)
I do not have sufficient funding to prepare data for sharing(N=509)
I do not have permission (consent) from my research participants to share data(N=510)
Data contain confidential / sensitive information and cannot be de-identified(N=504)
My data are commercially sensitive or has commercial value(N=501)
There are third party rights in my data(N=499)
No suitable repository exists for my data(N=502)
Country-specific regulations do not allow sharing(N=486)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Not at all important Slightly important Moderately important Very important Extremely important
Barriers to data sharing (ESRC)I may lose publication opportunities(N=231)
Others may misuse or misinterpret my data(N=229)
I have insufficient skills to prepare the data(N=227)
It requires time/effort to prepare data for deposit(N=233)
Insufficient funding to prepare data(N=232)
No consent from research participants to share data(N=232)
Confidential / sensitive data(N=229)
Commercially sensitive/has commercial value(N=218)
There are third party rights in my data(N=219)
No suitable repository exists for my data(N=220)
Country-specific regulations do not allow sharing(N=214)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Not at all important Slightly important Moderately important Very important Extremely important
Motivations for more data sharing (Wellcome)
Significant differences in motivations
MO
RE IM
PORT
ANT
LESS
IMPO
RTAN
T
Extra funding to cover costs
established researchers ~
cell, development and physical science,
genetic and molecular science, neuroscience
and mental health, population health
infection and immunobiology
Enhanced academic reputation
early career researchers~
researchers not sharing data now
Knowing how other people use data
early career researchers~
LMIC researchers~
cell, development and physical science,
humanities, infection and immuno-biology,
population health
genetic and molecular science
Co-authorship on reuse papers
early career researchersclinical, population
health, social science researchers
cell, devel and physical science, neuroscience
and mental health
biomedical and humanities researchers, genetic and molecular science, infection and
immunobiology
Case study that showcase data
LMIC researchers~
humanities, Infection and immuno-biology,
population health
cell, development and physical science, genetic and molecular science,
neuroscience and mental health
Data deposit leads to data paper publication
early career researchers; LMIC
researchers~
cell, development and physical science,
infection and immuno-biology, neuroscience
and mental health
genetic and molecular science, humanities and
social sciences MO
RE IM
PORT
ANT
LESS
IMPO
RTAN
T
Considered favourably in funding and
promotion decisions
UK-based researchers~
cell, development and physical science,
genetic and molecular science, neuroscience
and mental health
Population health
Evidence of data citation
early career researchers
researchers not sharing data now
Ability to limit data access to specific
purposes or individuals
LMIC researchers~
clinical, population health and social
science researchers
biomedical researchers
Assistance from institution or funder to
prepare data
clinical, population health and social
science researchers
biomedical and humanities researchers
Nothing would motivate
researchers not sharing data now
Code sharing practices• 40% generate code
– Researchers performing surveys, secondary analysis & simulations more likely to produce code
• 43% of these made code available in last 5 years– Researchers performing simulations, secondary analysis and
experiments share most code– Researcher applying qualitative and survey methods shared less
• 37% reuse existing code– Obtained from colleagues/collaborators & community
repository– Good documentation, originate from a reputable source, and
openly available are key factors in code reuse
Reasons to share code (Wellcome)My funder requires me to share my code(N=97)
Journal expects code to be accessible(N=97)
My research community expects code sharing(N=97)
It is good research practice to share code(N=101)
To enable collaboration and contribution (N=98)
Contributes to my academic credentials(N=95)
Enables validation of my research(N=97)
Enables replication of my research(N=96)
Improved visibility for my research(N=95)
I can get credit and more citations by sharing code(N=91)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Not at all important Slightly important Moderately important Very important Extremely important
Code sharing benefits (Wellcome)Career benefits
More publications
Higher citation rate
New collaborations
More funding opportunities
Financial benefit
New patents
Improvements to public health
Use in health emergencies
None
Other
0 5 10 15 20 25 30 35 40
Code sharing barriers (Wellcome)Desire to patent (N=210)
Protecting intellectual property (N=213)
Software and systems dependencies (N=213)
I may lose publication opportunities if I share code (N=210)
Others may misuse or misinterpret my code (N=211)
Insufficient skills to prepare the code for public use (N=213)
It requires time/effort to prepare my code for deposit (N=217)
Insufficient funding to prepare code for public use (N=211)
My code has commercial value (N=207)
There are third party rights in my code (N=206)
No suitable repository exists for my code (N=197)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Not at all important Slightly important Moderately important Very important Extremely important
Motivations for more code sharing (Wellcome)
Financial incentive from my institution
Extra funding to cover the costs
Enhanced academic reputation
Code access and metrics
Knowing how others use my code
Co-authorship on papers resulting from reuse
Case study that showcases my code
It is looked on more favourably in funding and promotion decisions
Evidence of code citation
Assistance from institution/funder staff to prepare code
Nothing motivates me
0 10 20 30 40 50 60
PROPOSED ACTIONS
Data Sharing & ReusePolicy development• Provide guidelines on how to share 'difficult' data types, e.g. sensitive and large data• Consider how contradictions between government and funder data sharing policy can addressedRewards• Ensure data sharing recognised in career progress evaluation• Facilitate opportunities for data creators to become co-authors on new publications based upon their dataPromotion• Monitor use and showcase examples of best practice• Provide networking/training opportunities for data creators and re-usersInfrastructure development• Build repository that offers free storage, supports granular access controls, and resource-specific features
(e.g. imaging data, large datasets)Funding• Consider a dedicated funding stream to cover data/code preparation for projects, and additional staff within
institution/project/support network to help with data preparation
Code Sharing & ReusePolicy development• Consider code sharing mandate• Include processing scripts such as stata.do files and batch files in interpretationRewards• recognise in funding decisions• encourage authors to cite code in research outputsPromotion• monitor code reuse and showcase examples of code sharing best practice• Provide networking/training opportunities for code developers and code re-usersInfrastructure development• Invest in creation of deposit tools• Consider setup of a long-term repository for research code (e.g. Wellcome GitLab), or offer guidance on platforms
to useFunding• Consider additional funding for code sharing preparation during project life and ongoing maintenance over time
Further developments• Wellcome Open Research platform• Wellcome Open Research Pilot Project (Cambridge)• Series of reports and reviews
Wellcome Trust, David Carr, Robert Kiley
Anca Vlad, UK Data Service
All researchers contributing wisdom via surveys and focus group discussions
Expert advisors: Barry Radler (University of Wisconsin), Carol Tenopir (University of Tennessee), David Leon (LSHTM), Frank Manista (Jisc), Jimmy Whitworth (LSHTM) and Louise Corti (UK Data Service)