Managing, Sharing and Curating
Your Research Data
in a Digital Environment
Sonia Barbosa, Manager of Data Curation, Harvard Dataverse
Danny Brooke, Dataverse Development Project Manager, Harvard Dataverse
Data Sharing Stories
-Different levels of openness in sharing data
-Verification of reproducibility
-Data loss
AGENDA
● Open Science Principles (Open Access, Open Data)
● Connecting Research Articles to Data
● Data Discoverability and Standard Citation
● Increasing Data Availability Statement Requirements by
Publishers
● Common Data Management and Curation Related
Challenges
● Common Discipline Specific Challenges in Data
Sharing And Curation (e.g. Arts; Humanities vs. STEM)
● Research Data Management Solutions with Dataverse
● Success Stories in Reuse of Datasets Found in Open
Data Repositories
● Success Stories in Raising Research Visibility with Data
Sharing
● Dataverse Roadmap
● Dataverse Integration with Other Data Repositories
e.g. OSF
● Dataverse Community and How To Get Engaged
OPEN SCIENCE PRINCIPLES
(OPEN ACCESS, OPEN DATA)
https://www.economist.com/news/leaders/21721656-data-economy-demands-new-approach-antitrust-rules-worlds-most-valuable-resource
● Greater access to public research data
● Access enabled by tools and platforms
● Broader collaboration in science
● The use of alternative copyright tools for diffusing research results
OPEN SCIENCE IMPLIES...
● Reducing the costs of data collection by facilitating the exploitation of dormant
or inaccessible data at low cost.
● Increasing the opportunities for collaboration in research as well as in innovation.
● Greater access to research data can also help advance science's contribution to solving global
challenges by enhancing access to data on a global scale (e.g.“in the case
of climate change data).
● Open science can also be used to promote capacity building in developing countries
while generating opportunities for scientific collaboration and innovation between
developing countries.
THE BENEFITS OF OPEN SCIENCE...
The Benefits of Open Science
WHY OPEN ACCESS?
Image: Aston University Library Services
● Open Access seeks to return scholarly publishing to its original purpose: to spread knowledge and allow that knowledge to be built
upon.
● Better visibility and higher impact for your scholarship.
● Avoiding duplication.
● Science can achieve its full potential.
● Text Mining (not possible behind “subscription” walls).
● More knowledge leads to better outcomes (for patients).
● Patients
● Developing countries
● Doctors
● Open Access raises the profile of research performed in the developing world - locally and globally.
● Demonstrated benefits
© 2007-2010 SPARC, subject to a Creative Commons Attribution 3.0 License
WHY OPEN DATA?
“The benefits of Open Data are diverse and range from
improved efficiency of public administrations, economic
growth in the private sector to wider social welfare.”
“Knowledge is open if anyone is free to access, use,
modify, and share it — subject, at most, to measures that
preserve provenance and openness.” Open Definition
(Open Knowledge, 2015) published by Open Knowledge
Key requirements for open data
● Availability
● Access
● Redistribution and reuse
© 2007 - 2018 SPARC, subject to a Creative Commons Attribution 4.0 International License
Overview of funders' data policies | Digital Curation Centre: http://www.dcc.ac.uk/resources/policy-and-legal/overview-funders-data-policies
Arts and Humanities Research Council (AHRC)
Biotechnology and Biological Sciences Research Council (BBSRC)
Engineering and Physical Sciences Research Council (EPSRC)
Copyright © Info-communications Media Development Authority
© 2017 Government of Singapore https://index.okfn.org/methodology/
Guiding principles for the conduct of open
science at the Montreal Neurological Institute and
Hospital (MNI).
These principles cover five areas: the public release
of data and other scientific resources; external
research partnerships; the MNI Biobank; researcher
and patient autonomy; and intellectual property. The
authors developed draft Guiding Principles based on
the results of this study. This draft was then presented
to the MNI staff, management and researchers, who
reviewed and amended the draft during two rounds of
discussion and feedback. These Guiding Principles
were adopted by the MNI in December 2016.
eLife 2017;6:e29319 DOI: 10.7554/eLife.29319
CONNECTING RESEARCH ARTICLES TO DATA
The FAIR Data Principles
https://www.force11.org/group/fairgroup/fairprinciples
Sünje Dallmeier-Tiessen (CERN)http://slideplayer.com/slide/5768687/
Article DOI
Dataset
identifier
DATA DISCOVERABILITY AND STANDARD CITATION
DataCite
● Open Access standards for Datasets
● International in scope including universities, research institutions, data governance agencies,
government entities, etc…
● DataCite is a leading global non-profit organisation that provides persistent identifiers (DOIs) for
research data. Our goal is to help the research community locate, identify, and cite research
data with confidence. (Datacite.org)
INCREASING DATA AVAILABILITY STATEMENT
REQUIREMENTS
BY PUBLISHERS
The Scientific Community is Establishing Best Practices
for Data Publishing and Replication
The Scientific Community is Establishing Best Practices for Data Publishing and Replication...
DA-RT Journal Policies
Goal: To increase transparency in social science
In 2016, the first group of DA-RT Journals began to post new data sharing and transparency policies:
American Journal of Political Science's Guidelines for Preparing Replication Materials
American Political Science Review's DA-RT Guidelines
Conflict Management and Peace Science DA-RT guidelines
The Italian Political Science Review's Replication Policy and Policy for Datasets and Supplemental Files
State Politics and Policy Quarterly's Guidelines for Preparing Replication Policies
The Scientific Community is Establishing Best Practices for Data Publishing and Replication...
TOP guidelines
include eight modular standards, each with three levels of increasing stringency. Journals select which of the eight transparency
standards they wish to adopt for their journal, and select a level of implementation for each standard. These features provide
flexibility for adoption depending on disciplinary variation, but simultaneously establish community standards.
Transparency, open sharing, and reproducibility are core values of science, but not always part of
daily practice. Journals, funders, and scholarly societies can increase reproducibility of research by
adopting the Transparency and Openness Promotion (TOP) Guidelines and helping them evolve to
meet the needs of researchers and publishers while pursuing the most transparent practices.
The Scientific Community is Establishing Best Practices for Data Publishing and Replication...
Authors Comply with Strong Data Policies
COMMON DATA MANAGEMENT AND CURATION
RELATED CHALLENGES
Common Data Management and Curation Related Challenges
What challenges in data management and curation are
you anticipating?
© 2017 Technology Networks, all rights reserved
© 2017 Technology Networks, all rights reserved
© 2017 Technology Networks, all rights
reserved
COMMON DISCIPLINE SPECIFIC CHALLENGES IN DATA
SHARING AND CURATION
(E.G. ARTS; HUMANITIES VS. STEM)
Sandra Gesing Center for Research Computing, University of Notre Dame [email protected] 7th National Data Service Consortium Workshop, Chicago 13 April 2017 Science Gateways: Addressing Data Management Challenges
*IDC Energy Insights for Oil & Gas 2015-2017 report: (2015 Upstream Intelligence, IDC Energy Insights, McKinsey
and Company, Bain and Company)
5 Reasons Healthcare Data Is Unique and Difficult to Measure By Dan LeSueur
The data explosion along the care cycle NVKVV 16de Colloquim ICT en gezondheidszorg Dinsdag 8 mei 2012, De Montil Moortelstraat 8, Affligem
Eric van ‘t Hoff, EMEA Healthcare ISV Alliance Manager Note: updated with latest Dell Storage solutions, December 2013
RESEARCH DATA MANAGEMENT SOLUTIONS WITH
DATAVERSE
Dataverse is an open source web application to share, preserve, cite, explore, and
analyze research data. It facilitates making data available to others, and allows you to
replicate others' work more easily. Researchers, data authors, publishers, data distributors,
and affiliated institutions all receive academic credit and web visibility.
https://dataverse.org/
Data Management Plan
Checklist for data management plan
Template for data management plans
http://best-practices.dataverse.org/data-management/index.html
Dataverse supports:
● Access and Sharing
● File Format Support
● Documentation, Metadata and Bibliographic Information
● Versioning
Dataverse facilitates data access by providing:
● descriptive and variable/question-level search;
● topical browsing;
● data extraction;
● re-formatting;
● on-line analysis
Dataverse performs:
● archival format migration;
● metadata extraction;
● validity checks;
The Dataverse application’s “templating” feature will be used for consistency of information across datasets.
The Dataverse repository automatically generates persistent identifiers, and Universal Numeric
Fingerprints (UNF) for datasets; extracts and indexes variable descriptions, missing-value codes and labels;
creates variable-level summary statistics; and facilitates open distribution
of metadata with a variety of standard formats (Data Cite, DDI v 2.5, Dublin Core, VO Resource,
and ISA-Tab) and protocols (OAI-PMH, SWORD)
Success Stories in Raising Research Visibility with Data Sharing
Murray Archive:
Bulimia Study by Colby
AJPS-
IQSS and the Dataverse Project
● ...to enable bigger, better, faster, and more collaborative
social science
● Transparency at all project levels
○ http://dataverse.org/goals-roadmap-and-releases
○ https://waffle.io/IQSS/dataverse
Key New Features in the Next Year
● Provenance Integration
● Data Locality/Multiple Storage Options
● Streaming Data/Code Deposit
Dataverse Roadmap - Quarter 2
Dataverse Roadmap - Quarter 3
● Search, Dataset and File Redesign
● Additional Data Transfer Options (Rsync/HTTP/Other)
● DataTags Integration
Dataverse Roadmap - Quarter 4
● File Handling (Skip Unzip, Skip Ingest, Uningest)
● Preserve File Hierarchy
● Embargo/Schedule Data Availability
(some) Dataverse Integrations
● Exploration and Visualization
○ TwoRavens, Data Explorer, WorldMap
● Getting Data In
○ OJS, OSF, RSpace
● Getting Data Out
○ Archivematica, Backup Script
● Storage Drivers and Compute Access
○ Openstack Swift, AWS, Azure (soon)
Dataverse Community
● 50+ code contributors outside of the Core Team
● Most contributors of any Harvard Open Source project
● Hundreds of members of the Dataverse Community -
developers, researchers, librarians, data scientists
○ Dataverse Google Group
○ Dataverse Community Calls
○ Dataverse Community Meeting
Dataverse Community
DATAVERSE COMMUNITY MEETING, 2018
THANK YOU!
https://groups.google.com/d/forum/dataverse-community
https://github.com/IQSS/dataverse/issues
References
https://www.dataone.org/
https://www.datacite.org/
https://www.rd-alliance.org/open-data
https://www.oecd.org/sti/outlook/e-outlook/stipolicyprofiles/interactionsforinnovation/openscience.htm
https://www.nap.edu/read/5504/chapter/5#61
https://www.force11.org/group/fairgroup/fairprinciples
http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002235
https://obamawhitehouse.archives.gov/blog/2013/02/22/expanding-public-access-results-federally-funded-research
http://www.righttoresearch.org/learn/whyoa/index.shtml
https://www.dartstatement.org/
https://datascience.codata.org/articles/10.5334/dsj-2017-009/
References
Open Access and the Future of Scholarly Communication: Policy and Infrastructure
By Kevin L. Smith, Katherine A. Dickson
https://www.dtls.nl/fair-data/fair-principles-explained
https://cos.io/our-services/top-guidelines/
https://www.cessda.eu/
http://library.harvard.edu/sites/default/files/HarvardPurdue_Workshop_full.pdf
https://www.fosteropenscience.eu/content/what-open-science-introduction
http://www.unesco.org/new/en/communication-and-information/portals-and-platforms/goap/open-science-movement/
http://dataconservancy.org/
http://sciencecommons.org/resources/readingroom/principles-for-open-science/