+ All Categories
Home > Documents > Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible...

Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible...

Date post: 11-Jul-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
82
Transcript
Page 1: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for
Page 2: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

Managing, Sharing and Curating

Your Research Data

in a Digital Environment

Sonia Barbosa, Manager of Data Curation, Harvard Dataverse

Danny Brooke, Dataverse Development Project Manager, Harvard Dataverse

Page 3: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

Data Sharing Stories

-Different levels of openness in sharing data

-Verification of reproducibility

-Data loss

Page 4: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

AGENDA

● Open Science Principles (Open Access, Open Data)

● Connecting Research Articles to Data

● Data Discoverability and Standard Citation

● Increasing Data Availability Statement Requirements by

Publishers

● Common Data Management and Curation Related

Challenges

● Common Discipline Specific Challenges in Data

Sharing And Curation (e.g. Arts; Humanities vs. STEM)

● Research Data Management Solutions with Dataverse

● Success Stories in Reuse of Datasets Found in Open

Data Repositories

● Success Stories in Raising Research Visibility with Data

Sharing

● Dataverse Roadmap

● Dataverse Integration with Other Data Repositories

e.g. OSF

● Dataverse Community and How To Get Engaged

Page 5: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for
Page 6: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

OPEN SCIENCE PRINCIPLES

(OPEN ACCESS, OPEN DATA)

Page 7: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for
Page 8: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for
Page 9: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

https://www.economist.com/news/leaders/21721656-data-economy-demands-new-approach-antitrust-rules-worlds-most-valuable-resource

Page 10: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

● Greater access to public research data

● Access enabled by tools and platforms

● Broader collaboration in science

● The use of alternative copyright tools for diffusing research results

OPEN SCIENCE IMPLIES...

Page 11: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

● Reducing the costs of data collection by facilitating the exploitation of dormant

or inaccessible data at low cost.

● Increasing the opportunities for collaboration in research as well as in innovation.

● Greater access to research data can also help advance science's contribution to solving global

challenges by enhancing access to data on a global scale (e.g.“in the case

of climate change data).

● Open science can also be used to promote capacity building in developing countries

while generating opportunities for scientific collaboration and innovation between

developing countries.

THE BENEFITS OF OPEN SCIENCE...

The Benefits of Open Science

Page 12: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for
Page 13: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for
Page 14: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

WHY OPEN ACCESS?

Page 15: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

Image: Aston University Library Services

Page 16: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

● Open Access seeks to return scholarly publishing to its original purpose: to spread knowledge and allow that knowledge to be built

upon.

● Better visibility and higher impact for your scholarship.

● Avoiding duplication.

● Science can achieve its full potential.

● Text Mining (not possible behind “subscription” walls).

● More knowledge leads to better outcomes (for patients).

● Patients

● Developing countries

● Doctors

● Open Access raises the profile of research performed in the developing world - locally and globally.

● Demonstrated benefits

© 2007-2010 SPARC, subject to a Creative Commons Attribution 3.0 License

Page 17: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

WHY OPEN DATA?

Page 18: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

“The benefits of Open Data are diverse and range from

improved efficiency of public administrations, economic

growth in the private sector to wider social welfare.”

Page 19: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

“Knowledge is open if anyone is free to access, use,

modify, and share it — subject, at most, to measures that

preserve provenance and openness.” Open Definition

(Open Knowledge, 2015) published by Open Knowledge

Page 20: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for
Page 21: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

Key requirements for open data

● Availability

● Access

● Redistribution and reuse

© 2007 - 2018 SPARC, subject to a Creative Commons Attribution 4.0 International License

Page 22: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for
Page 23: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

Overview of funders' data policies | Digital Curation Centre: http://www.dcc.ac.uk/resources/policy-and-legal/overview-funders-data-policies

Arts and Humanities Research Council (AHRC)

Biotechnology and Biological Sciences Research Council (BBSRC)

Engineering and Physical Sciences Research Council (EPSRC)

Page 24: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

Copyright © Info-communications Media Development Authority

Page 25: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

© 2017 Government of Singapore https://index.okfn.org/methodology/

Page 26: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for
Page 27: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for
Page 28: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

Guiding principles for the conduct of open

science at the Montreal Neurological Institute and

Hospital (MNI).

These principles cover five areas: the public release

of data and other scientific resources; external

research partnerships; the MNI Biobank; researcher

and patient autonomy; and intellectual property. The

authors developed draft Guiding Principles based on

the results of this study. This draft was then presented

to the MNI staff, management and researchers, who

reviewed and amended the draft during two rounds of

discussion and feedback. These Guiding Principles

were adopted by the MNI in December 2016.

eLife 2017;6:e29319 DOI: 10.7554/eLife.29319

Page 29: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for
Page 30: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

CONNECTING RESEARCH ARTICLES TO DATA

Page 31: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for
Page 32: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

The FAIR Data Principles

https://www.force11.org/group/fairgroup/fairprinciples

Page 33: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

Sünje Dallmeier-Tiessen (CERN)http://slideplayer.com/slide/5768687/

Page 34: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

Article DOI

Dataset

identifier

Page 35: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

DATA DISCOVERABILITY AND STANDARD CITATION

Page 36: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for
Page 37: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for
Page 38: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for
Page 39: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

DataCite

● Open Access standards for Datasets

● International in scope including universities, research institutions, data governance agencies,

government entities, etc…

● DataCite is a leading global non-profit organisation that provides persistent identifiers (DOIs) for

research data. Our goal is to help the research community locate, identify, and cite research

data with confidence. (Datacite.org)

Page 40: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

INCREASING DATA AVAILABILITY STATEMENT

REQUIREMENTS

BY PUBLISHERS

Page 41: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

The Scientific Community is Establishing Best Practices

for Data Publishing and Replication

Page 42: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

The Scientific Community is Establishing Best Practices for Data Publishing and Replication...

DA-RT Journal Policies

Goal: To increase transparency in social science

In 2016, the first group of DA-RT Journals began to post new data sharing and transparency policies:

American Journal of Political Science's Guidelines for Preparing Replication Materials

American Political Science Review's DA-RT Guidelines

Conflict Management and Peace Science DA-RT guidelines

The Italian Political Science Review's Replication Policy and Policy for Datasets and Supplemental Files

State Politics and Policy Quarterly's Guidelines for Preparing Replication Policies

Page 43: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

The Scientific Community is Establishing Best Practices for Data Publishing and Replication...

TOP guidelines

include eight modular standards, each with three levels of increasing stringency. Journals select which of the eight transparency

standards they wish to adopt for their journal, and select a level of implementation for each standard. These features provide

flexibility for adoption depending on disciplinary variation, but simultaneously establish community standards.

Transparency, open sharing, and reproducibility are core values of science, but not always part of

daily practice. Journals, funders, and scholarly societies can increase reproducibility of research by

adopting the Transparency and Openness Promotion (TOP) Guidelines and helping them evolve to

meet the needs of researchers and publishers while pursuing the most transparent practices.

Page 44: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for
Page 45: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

The Scientific Community is Establishing Best Practices for Data Publishing and Replication...

Page 46: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

Authors Comply with Strong Data Policies

Page 47: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for
Page 48: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for
Page 49: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for
Page 50: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

COMMON DATA MANAGEMENT AND CURATION

RELATED CHALLENGES

Page 51: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for
Page 52: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for
Page 53: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

Common Data Management and Curation Related Challenges

What challenges in data management and curation are

you anticipating?

Page 54: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for
Page 55: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

© 2017 Technology Networks, all rights reserved

Page 56: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

© 2017 Technology Networks, all rights reserved

Page 57: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

© 2017 Technology Networks, all rights

reserved

Page 58: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

COMMON DISCIPLINE SPECIFIC CHALLENGES IN DATA

SHARING AND CURATION

(E.G. ARTS; HUMANITIES VS. STEM)

Page 59: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

Sandra Gesing Center for Research Computing, University of Notre Dame [email protected] 7th National Data Service Consortium Workshop, Chicago 13 April 2017 Science Gateways: Addressing Data Management Challenges

Page 60: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

*IDC Energy Insights for Oil & Gas 2015-2017 report: (2015 Upstream Intelligence, IDC Energy Insights, McKinsey

and Company, Bain and Company)

Page 61: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

5 Reasons Healthcare Data Is Unique and Difficult to Measure By Dan LeSueur

Page 62: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

The data explosion along the care cycle NVKVV 16de Colloquim ICT en gezondheidszorg Dinsdag 8 mei 2012, De Montil Moortelstraat 8, Affligem

Eric van ‘t Hoff, EMEA Healthcare ISV Alliance Manager Note: updated with latest Dell Storage solutions, December 2013

Page 63: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

RESEARCH DATA MANAGEMENT SOLUTIONS WITH

DATAVERSE

Page 64: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

Dataverse is an open source web application to share, preserve, cite, explore, and

analyze research data. It facilitates making data available to others, and allows you to

replicate others' work more easily. Researchers, data authors, publishers, data distributors,

and affiliated institutions all receive academic credit and web visibility.

https://dataverse.org/

Data Management Plan

Checklist for data management plan

Template for data management plans

http://best-practices.dataverse.org/data-management/index.html

Page 65: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for
Page 66: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for
Page 67: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for
Page 68: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

Dataverse supports:

● Access and Sharing

● File Format Support

● Documentation, Metadata and Bibliographic Information

● Versioning

Page 69: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

Dataverse facilitates data access by providing:

● descriptive and variable/question-level search;

● topical browsing;

● data extraction;

● re-formatting;

● on-line analysis

Dataverse performs:

● archival format migration;

● metadata extraction;

● validity checks;

The Dataverse application’s “templating” feature will be used for consistency of information across datasets.

The Dataverse repository automatically generates persistent identifiers, and Universal Numeric

Fingerprints (UNF) for datasets; extracts and indexes variable descriptions, missing-value codes and labels;

creates variable-level summary statistics; and facilitates open distribution

of metadata with a variety of standard formats (Data Cite, DDI v 2.5, Dublin Core, VO Resource,

and ISA-Tab) and protocols (OAI-PMH, SWORD)

Page 70: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

Success Stories in Raising Research Visibility with Data Sharing

Murray Archive:

Bulimia Study by Colby

AJPS-

Page 71: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

IQSS and the Dataverse Project

● ...to enable bigger, better, faster, and more collaborative

social science

● Transparency at all project levels

○ http://dataverse.org/goals-roadmap-and-releases

○ https://waffle.io/IQSS/dataverse

Page 72: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

Key New Features in the Next Year

Page 73: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

● Provenance Integration

● Data Locality/Multiple Storage Options

● Streaming Data/Code Deposit

Dataverse Roadmap - Quarter 2

Page 74: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

Dataverse Roadmap - Quarter 3

● Search, Dataset and File Redesign

● Additional Data Transfer Options (Rsync/HTTP/Other)

● DataTags Integration

Page 75: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

Dataverse Roadmap - Quarter 4

● File Handling (Skip Unzip, Skip Ingest, Uningest)

● Preserve File Hierarchy

● Embargo/Schedule Data Availability

Page 76: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

(some) Dataverse Integrations

● Exploration and Visualization

○ TwoRavens, Data Explorer, WorldMap

● Getting Data In

○ OJS, OSF, RSpace

● Getting Data Out

○ Archivematica, Backup Script

● Storage Drivers and Compute Access

○ Openstack Swift, AWS, Azure (soon)

Page 77: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

Dataverse Community

● 50+ code contributors outside of the Core Team

● Most contributors of any Harvard Open Source project

● Hundreds of members of the Dataverse Community -

developers, researchers, librarians, data scientists

○ Dataverse Google Group

○ Dataverse Community Calls

○ Dataverse Community Meeting

Page 78: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

Dataverse Community

Page 79: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

DATAVERSE COMMUNITY MEETING, 2018

Page 80: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

THANK YOU!

[email protected]

https://groups.google.com/d/forum/dataverse-community

https://github.com/IQSS/dataverse/issues

Page 81: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

References

https://www.dataone.org/

https://www.datacite.org/

https://www.rd-alliance.org/open-data

https://www.oecd.org/sti/outlook/e-outlook/stipolicyprofiles/interactionsforinnovation/openscience.htm

https://www.nap.edu/read/5504/chapter/5#61

https://www.force11.org/group/fairgroup/fairprinciples

http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002235

https://obamawhitehouse.archives.gov/blog/2013/02/22/expanding-public-access-results-federally-funded-research

http://www.righttoresearch.org/learn/whyoa/index.shtml

https://www.dartstatement.org/

https://datascience.codata.org/articles/10.5334/dsj-2017-009/

Page 82: Managing, Sharing and Curating · Science can achieve its full potential. Text Mining (not possible behind “subscription” walls). More knowledge leads to better outcomes (for

References

Open Access and the Future of Scholarly Communication: Policy and Infrastructure

By Kevin L. Smith, Katherine A. Dickson

https://www.dtls.nl/fair-data/fair-principles-explained

https://cos.io/our-services/top-guidelines/

https://www.cessda.eu/

http://library.harvard.edu/sites/default/files/HarvardPurdue_Workshop_full.pdf

https://www.fosteropenscience.eu/content/what-open-science-introduction

http://www.unesco.org/new/en/communication-and-information/portals-and-platforms/goap/open-science-movement/

http://dataconservancy.org/

http://sciencecommons.org/resources/readingroom/principles-for-open-science/


Recommended