Introduction to DataverseMeghan Goodchild
Scholars Portal & Queen’s UniversityApril 19, 2018
Plan● Dataverse et la gestion des données de recherche (GDR)
● Scholars Portal Dataverse
● Dépôt des données
● Découverte des données et réutilisation
● Développements à venir
● Comment faut-il se commencer?
What is Dataverse?● Open source research data repository software
● Store, share, publish and discover research data
https://dataverse.org/researchers
What is Dataverse?● Open source research data repository software
● Store, share, publish and discover research data
Dataverse et GDR
https://portagenetwork.ca/fr/
Scholars Portal Dataverse
Dataverses, Datasets, Files
Dataverse = Container for datasets
and/or dataversesDataset = Container for your data,
documentation, and code
Usage
401 GBTotal size of published datasets
71,769Total number of downloads
957Number of published datasets
889Number of registered users
Usage at a glance...
Usage - Downloads over time
Usage statistics are for Harvard Dataverse: https://dataverse.org/metrics
Dataset name
Assessment of the fecal microbiota in beef calvesUniversity of Guelph Dataverse
2011 National Household Survey, Forward Sortation Area
(FSA) Level [custom tabulation]
University of Toronto
Open access article processing charges May 2014University of Ottawa Dataverse
1369
1369
765
673
Forum Research Political Poll - Federal Issues (Canada)
20133809 University of Toronto
Theano-based large-scale visual recognition with multiple
GPUs
University of Guelph Dataverse
Dataverse name
2998
Dataverse usage - Downloads by dataset
Usage - Disciplinary coverage
Usage statistics are for Harvard Dataverse: https://dataverse.org/metrics
Data deposit● Institutional dataverses
● Deposit types
● Metadata
● Versioning
● File formats
Institutional dataverses• All data organized by institution (general root
available for non-affiliates, multi-institutional projects)
• Researchers deposit in Institutional Dataverses (defined by user affiliation)
• Library administers institutional space
• Customizable features (branding, featured Dataverses, facets, etc.)
Mediated deposit
Self-deposit
• Open to anyone to deposit and publish data• Usage statistics at institution-level to track published data
Note: SP Terms of Service covers removal of data if necessary (http://guides.scholarsportal.info/dataverse)
Metadata Standards• Citation (DataCite, Dublin Core, DDI)• Social Science (DDI)• Astronomy (Virtual Observatory VOResource)• Life Science (ISA-Tab)
Dataset Versioning
Dataset Versioning (continued)
File formats
All file formats are supported
• Tabular data files (SPSS, R, excel, CSV)
• Geospatial files
• Images
SP Dataverse storage• Maximum file size 2GB per file• Unzipping, tabular ingest processing intensive
Data discovery and re-use
● Linking research outputs
● Permissions and licensing
● Discovery and access
● Collaboration
● Data explorer
ORCID and DataCite integration
• Mechanisms for linking research output• ORCID ID field• ORCID sign-in *not configured • DataCite Canada DOI minting • DataCite indexing
Cross-referencing research outputs
Reference and link to publication in Dataverse record
Cite and link to data in publication
Obtain a DOI / data citation beforepublishing data
Provide private access to data during review process
Licensing• Default to CC0 (open data)
OR
• Custom terms of use ‘Data Usage Agreement’
• Restricted files and custom terms of access
User/groups and roles
• Assign permissions for collaborators, curators, file downloaders (access)
• Granular file-level permissions • IP Group based permissions
Dataset and file permissions
Dataset and file permissions (continued)
Discovery and Access
• Open Archives Initiative (OAI) protocol for metadata harvesting (OAI-PMH)
• Metadata from published, unrestricted datasets can be harvested
• Dataverse APIs
• Search API
• Data Access API
• SWORD API (upload)
• Native API
GuestbooksWho is downloading my data?
• User fills out guestbook form
• Owner downloads guestbook report
• Can be used to mediate access / approval of access to data
Collaborative data sharing
• Group permissions• Assign account roles• Track changes through versioning
Data ExplorerChart view
Data ExplorerCross tabulation
Preservation Current support:
• Native support for file verification • checksums, UNF• file format identification
• DOIs for persistent identification of data• Derivative generation (tabular data -
.tsv/tab)
Needed:
• Policies required• repositories have some responsibilities
(short term)• institution focused long-term archiving
Coming Next
• Upgrading to 4.8.6 with new French translations (UdeM)• Dataverse-Archivematica integration• Investigating Shibboleth login support• Persistent identifiers at the file level• Support for file hierarchy• Large file support and HTTP upload support
Coming Next Dynamic web report/metrics
• Institutional listing• Downloads by month• File size by Dataverse• File type• Number of datasets by month• Subject coverage
Why use Dataverse?
• Supports FAIR data principles• Findable, Accessible, Interoperable, Reuseable
• Secure data management• Effective sharing• Long-term access and preservation• Increase research visibility
Get started with using Dataverse
Scholars Portal Demo Dataverse
http://demodv.scholarsportal.info
Scholars Portal Production Dataverse
http://dataverse.scholarsportal.info
Free, open for all researchers in Canada
SP Dataverse support team
Amber Leahey - Data & GIS Librarian
Meghan Goodchild - RDM Systems Librarian (Queen's / SP)
Kaitlin Newson - Digital Projects Librarian
Kevin Worthington - Data/GIS Programmer
Jayanthy Chengan - Senior Developer
Bikram Singh - Systems Analyst
Contact us at [email protected]
Équipe de GDR à uOttawa
Resources
Dataverse Usage Statistics (2014-2017)
• Report | Spreadsheet
User Guides
• SP Guide | Harvard Guide
Questions?