GEN-8001: Take control of your PhD journey
Research data managementPart 2: Best practice recommendations
Data without sensitive information
Philipp ConzettUniversity LibraryFebruary 27, 2020
Best practice for research data management
Overall recommendation in brief:Manage your data according to the FAIR principles throughout the lifecycle of research data!
The FAIR principles of data management
Findable• Published with a
persistent identifier• Good metadata• Indexed in search
engines Findable
DOI = Digital Object Identifier = a type of persistent identifier ~ persistent URL
Metadata = description of dataExamples of metadata:• Author, title, description, …• Keywords• Geographical information
Accessible• Published data should be accessible
through a well-defined and open protocol(e.g. http, ftp).• If necessary, the protocol must enable
sufficient authentication of users, e.g. for access to sensitive data.
• Well-defined and open protocol
• Sufficient support for authentication
Accessible
Interoperable• Metadata are based on common metadata standards This applies to both
• general metadata, e.g. international date format (e.g. ISO-8601): YYYY-MM-DD (2019-12-09), and
• domain-specific metadata, e.g. Darwin Core = a standard for description of data on biological diversity.
• Keywords are based on controlled metadata vocabularies, e.g. Darwin Core standard values for age class or life stage of a biological individual:
• Interoperabilityenable search andre-user acrossdatasets andrepositories.
• Open metadata formats
• Common standards• Controlled
vocabularies
Inter-operable
Re-usable• Data are documented, so that your peers understand them and are
able to (re-)use them; e.g. in a ReadMe file.• Data are archived in preferred/sustainable file formats, so that the
files can be opened and read also in the long term; e.g. tabulator-separated plain text (.txt) for Excel spreadsheets (.xlsx).• Data are furnished with a clear use license, so that researchers who
want to use them know what they are allowed to do with them. Example: Creative Commons (CC) licenses
• Documentation• Open filfe formats• Clear use licenses
Re-usable
Together: FAIR
• Open metadata formats
• Common standards• Controlled
vocabularies
• Documentation• Open filfe formats• Clear use licenses
• Well-defined and open protocol
• Sufficient support for authentication
• Published with a persistent identifier
• Good metadata• Indexed in search• engines
Findable Accessible
Inter-operableRe-usable
Best practice for research data management
In brief:Manage your data according to the FAIR principles throughout the lifecycle of research data!
http
s://
guid
es.li
brar
y.ucs
c.ed
u/da
tam
anag
emen
t
2. The active phase, when you process and analyse your research data, and when you work on your publication.
1. The planning phase, when you plan how you are going to search for existing data, and collect, process and archive/share your own data.
3. The archiving and publishing phase, when you have finished processing and analysing your data, and you are ready to submit your paper. Depending on the nature of your data, you archive and share them openly or with access restrictions.
>> In order to plan, you should have a good grasp of the other main phases.
How to be FAIR in the different phases?
Planning phase:• How to search for and cite existing data• How to write a data management plan (last part)
Active phase:• Good routines for organizing and document your data• Good routines for data storage
Archiving and publishing phase:• Choose a trusted and FAIR-aligned repository
How to be FAIR in the different phases?
Planning phase:• How to search for and cite existing data• How to write a data management plan (last part)
Active phase:• Good routines for organizing and documenting your data• Good routines for data storage
Archiving and publishing phase:• Choose a trusted and FAIR-aligned repository
Search for existing data
• Good practice at the outset of your research project:• Literature survey• AND data survey
Search for existing data
• Good practice at the outset of your research project:• Literature survey• AND data survey
• Sources / places to search:• Research group, supervisor, colleagues, ...
Search for existing data
• Good practice at the outset of your research project:• Literature survey• AND data survey
• Sources / places to search:• Research group, supervisor, colleagues, ...• Publications (articles or books): References to data or related content
References to data in publications
Related content in publications
Search for existing data
• Good practice at the outset of your research project:• Literature survey• AND data survey
• Sources / places to search:• Research group, supervisor, colleagues, ...• References to data or related content in publications• Directory, e.g. Registry of Research Data Repositories
Registry of Research Data Repositories
Find relevant repositories by• browsing by subject, content type or country
Registry of Research Data Repositories
Find relevant repositories by• browsing by subject, content type or country• searching, including filtering the results according to several attributes
Registry of Research Data Repositories
Filter options:e.g. Reference to research data:Persistent Identifier (PID)
Search for existing data
• Good practice at the outset of your research project:• Literature survey• AND data survey
• Sources / places to search:• Research group, supervisor, colleagues, ...• References to data or related content in publications• Directory, e.g. Registry of Research Data Repositories• Research data repository/-ies
Example: UiT Open Research Data
• UiT’s repository for open research data• Data from any discipline
(= generic repository)• For data deposits from staff
and students at UiT
Registry vs. repository
Note:• In re3data you search for archives/repositories, not for specific
datasets. Therefore, you should use general search terms, such as the name of your discipline, e.g. chemistry or linguistics.• Once you have found a relevant repository, you can start search more
specifically for the type of data you are looking for. In the repository you can use more specific search terms, and you should also try synonyms if you only get few or no results.
Search for existing data
• Good practice at the outset of your research project:• Literature survey• AND data survey
• Sources / places to search:• Research group, supervisor, colleagues, ...• References to data or related content in publications• Directory, e.g. Registry of Research Data Repositories• Research data repository/-ies• Search engines / discovery services
DataCite Search
Search for datasets across a many research data repositories
BASE Bielefeld
Google Dataset Search
Cite research data Once you have found suitable research data and have reused them in your own project:• Give credit to the author of the dataset.• Use reference provided by repository (may also be downloaded in
formats).• Example reference from UiT Open Research Data:
Runge, Claire A.; Daigle, Remi M.; Hausner, Vera H., 2020, "Replication data for: Quantifying tourism booms and theincreasing footprint in the Arctic with social media data", https://doi.org/10.18710/QEOFPY, DataverseNO, V1, UNF:6:czqP04pzsqnMiDPqDgKLpg== [fileUNF]
Cite research data
• Research data should be referred to in the same way as other sources you have used in your research.
Reference list
• Add reference to the reference list together with the other sources:
In-text citation
How to be FAIR in the different phases?
Planning phase:• How to search for and cite existing data• How to write a data management plan (last part)
Active phase:• Good routines for organizing and documenting your data• Good routines for data storage
Archiving and publishing phase:• Choose a trusted and FAIR-aligned repository
Why?
http://phdcomics.com/comics/archive.php?comicid=1323
Do you know where your data are?
Why?
“..a systematic human error in coding the name of the files had been made during the extraction of the EEG template topographic maps best differentiating the two experimental conditions at the single subject level.”http://retractionwatch.com/2014/01/07/doing-the-right-thing-authors-retract-brain-paper-with-systematic-human-error-in-coding/
How to organize your data in a persistent way?
Also many years from now.
Restructuring and reformatting afterwards can be time-consuming and tedious.
Data must be understandable for others.
Think of structuring data early in the project.
Key issues:à Data storageà File and folder namingà File descriptionà File format
For more information on how to structure and document your data, see UiT Open Research Data Deposit Guidelines: https://site.uit.no/dataverseno/deposit/prepare/
Good storage routines – why?
Avoid data loss!
Why YOU need a data management plan, CC-BY-2.0, by Peter Murray-Rust
Good storage routines – how?
• Regular backup• Versioning = keep track of changes• Several backups:
– Here: e.g. your computer– Near: e.g. your home directory at UiT (\\homer.uit.no)– Far: e.g. cloud service:
» Office 365 OneDrive» Office 365 SharePoint - Shared storage areas – automatic backup & versioning
Info & help (for more advanced storage services):• UiT Research Data Portal:
https://uit.no/researchdata>> Working with your research data
• Separate course module: How to store research data
File and Folder Naming and OrganizingSome fundamental file naming recommendations:
• Files should be named consistently
• File names should be descriptive, but short (< 25 characters)
• Use underscores ( _ ) instead of spaces
• Avoid special characters like “ / \ : * . ? ‘ < > [ ] ( ) & $ æÆ øØ åÅ äÄ ….
• Use the international dating convention YYYY-MM-DD
File Ordering
Source: The University of California, Santa Cruz, Data Management Library Guides, Movie Tutorials, Module 1.3: File Naming. Viewed May
24, 2016 at http://guides.library.ucsc.edu/datamanagement/movies
Order by date:1955-04-12_notes_MassObs.docx
1955-04-12_questionnaire_MassObs.pdf
1963-12-15_notes_Gorer.docx
1963-12-15_questionnaire_Gorer.pdf
Order by content:Gorer_notes_1963-12-15.docx
Gorer_questionnaire_1963-12-15.pdf
MassObs_notes_1955-04-12.docx
MassObs_questionnaire_1955-04-12.pdfOrder by type:Notes_Gorer_1963-12-15.docx
Notes_MassObs_1955-04-12.docx
Questionnaire_Gorer_1963-12-15.pdf
Questionnaire_MassObs_1955-04-12.pdf
Force order with numbering:01_MassObs_questionnaire_1955-04-12.pdf
02_MassObs_notes_1955-04-12.docx
03_Gorer_questionnaire_1963-12-15.pdf
04_Gorer_notes_1963-12-15.docx
Folder naming and organization
Use a consistent strategy.
Main structure of data should be visible in the file names.
- also useful for archiving afterwards.
Document it in a ReadMe-file.
ReadMe file: User guide to your data
In short:Enough information to
understand and replicate/re-use your data
• Describe– contact information– what the dataset is about– file structure and naming conventions– where to find which data = overview of your files– methods and workflow– column headings in tabular data– abbreviations– units of measurement– …
• Start documentation early!• Save ReadMe file as Unicode UTF-8 (.txt) or PDF/A• Check specific metadata requirements (discipline, archive, ...)
Examples of domain-specific metadata standards: See this overview
ReadMe file – made-up exampleADMINISTRATIVE INFORMATIONProject: Kristin’s important chemistry projectDate: June 2016-April 2017Description: Description of my awesome project hereFunder: Department of Energy, grant no: XXXXXXContact: Kristin Briney, [email protected]
CONTENTThis dataset is about …
FILE AND FOLDER ORGANIZATIONAll files live in the ‘ImportantProject’ folder, with content organized into subfolders as follows:‘RawData’: All raw data goes into this folder, with subfolders organized by date‘AnalyzedData’: Data analysis files‘Documentation’: Scanned copies of my written research notes and other research notes‘Miscellaneous’: Other information that relates to this project
(Adapted from README.txt, http://dataabinitio.com/?p=378)
ReadMe file - example (cont.)
COLUMN HEADINGS AND ABBREVIATIONSExplanation of column headings used in DataFile01• H1 contains ...• H2 contains ...Explanation of abbreviations used in DataFile01:• A1 means ...• A2 means ...
NAMINGRaw data files will be named as follows:“YYYY-MM-DD_experiment_sample_ExpNum”(ex: “2014-02-24_UVVis_KMnO4_2.txt”)
(Adapted from README.txt, http://dataabinitio.com/?p=378)
Preparing for archiving
Selection of data
• Which data are necessary to understand and replicate your study?
• Raw version? Processed/analyzed version(s)? …
• Do not exclude negative / null data.
Need for anonymization and / or aggregation?
• (cf. person-identifying / sensitive data)
Provide your data in (a) preferred file format(s) to support long-term use.
Preferred file formats
Usually• non-proprietary,• open and based on documented international standards,• in common usage by the research community,• using standard character encodings (e.g. ASCII, UTF-8),• uncompressed (loss-less)
Archiving: Preferred file formatsPreferred file formats for common document typesDocument type Non-persistent format
(examples)Persistent format
Text MS Word (.docx) PDF/A
Spreadsheet MS Excel (.xlsx) Tabulator-separated Unicode UTF-8 text (.txt)
Image Windows Bitmap (.bmp) Uncompressed TIFF
Sound AAC (.m4a) WAV
Video Quicktime (.mov) MPEG-4
Database MS Access (.accdb) XML or tabulator-separated Unicode text (.txt)
More information in UiT Open Research Data depositing guidelines
How to be FAIR in the different phases?
Planning phase:• How to search for and cite existing data• How to write a data management plan (last part)
Active phase:• Good routines for organizing and documenting your data• Good routines for data storage
Archiving and publishing phase:• Choose a trusted and FAIR-aligned repository
How to choose a suitable repository?
• Requirements from funder or publisher?
• Person-identifying data? >>>>>
• Domain-specific repository? >>>>>
• Open data?>> Institutional repository >>>>>
Note: In UiT Open Research Data, your dataset will be
reviewed/curated (=checked for compliance with deposit
guidelines) before it is published.
Most importantly:
Choose a repository that meets UiT’s requirements, including:• Data must be made openly available (providing this is not prevented
by any legal, ethical, security, or commercial reasons.)• Permission to exploit and/or publish the research data shall not be
granted to commercial parties without UiT retaining the rights to make the data openly accessible for reuse.• If you are unsure, contact us at [email protected]!
How to be FAIR in the different phases?
Planning phase:• How to search for and cite existing data• How to write a data management plan (last part)
Active phase:• Good routines for organizing and documenting your data• Good routines for data storage
Archiving and publishing phase:• Choose a trusted and FAIR-aligned repository
What is a data management plan?
• A data management plan (DMP) is plan that documents how you are going to manage your research data during and after the project period.
• Covers all phases in the research data management lifecycle• Search• Collection• Processing: analysis and storage• Archiving and sharing
• Created before project start, but usually revised during the project >> active document!
What is the purpose of a data management plan?
• Helps you to keep track on your data throughout the whole project, and thus making your data as FAIR as possible• Helps you to save time and extra work later on, e.g. when
you archive and share your data• Requested by research funders such as the Research Council
of Norway (NFR) and the European Research Council (ERC)• Requested by UiT
Typical aspects to be documented in a Data Management Plan
General information about the research project
• Project name
• Description of project
• Part of a larger research project?
• Funding
• Project leader and participants (name and institution)
Responsibilities and rights• Who is responsible for follow-up and revision of the DMP?
>> UiT guidelines: The project leader (PI)
• How will responsibilities be divided? External collaborators?
>> Training session on collaboration agreements
• Who has right to manage the data?
>> Collect, structure, revision, processing, etc.
• Who can access the data during the project?
>> Use (view or download), but not manage
• Who has ownership of the data?
>> UiT guidelines: UiT The Arctic University of Norway, if no other agreement(s) are in place
Collecting/generating data• What kind of data will be collected/generated? Sources?
– E.g., observations, simulations, interview
• Standards and methods for collection/generation?
• When will the data be collected/generated?
• What type of data?
– Text, image, numerical data, sound, etc.
• Need for extra hardware or software?
• Need for special expertise?
• If there are data in this subject already, what are the possibilities for integration and reuse?
>> Training session on search and citation
Documentation and metadata
• How will the data be documented? (e.g. ReadMe file)
• If metadata standards are used, which ones?
• Examples of subject-specific metadata standards: overview
• What file formats will be used? (cf. preferred file formats)
• What kind of folder structure and filename conventions will be used?
• Is special software for reading/interpreting the data necessary?
Storage and preservation during the project
• What are the procedures for storage and backup, and where will this be done?
• What are the expected file size for the data?
• Do you have sufficient storage possibilities or need for extra services?
• Who is responsible for backup and restoring the data?
>> IT Department if stored at UiT facilities
• For collecting in the field, how will the data be safely transferred from the field to
the main storage facility?
Archiving and sharing (1)
• Which data will be preserved and which will be destroyed?
• Will the data be long-term preserved, and how is this decided?
• Will the data or a selection of the data be openly shared, and if so, which data?
>> UiT guidelines: as a rule data shall be made openly available
• If data will not be shared, what is the reason?(Cf. As open as possible, as closed as necessary; ethical, legal or other issues)
• Do the data need processing before they can be shared?>> E.g. anonymization, aggregation, conversion to preferred file formats.
Archiving and sharing (2)
• Where will data, metadata, documentation and code associated with the data be archived?• Requirements from funder or publisher?
• Person-identifying data? àààààà
• Domain-specific repository? ààààà
• Open data?à Institutional repository àààààà
(= part of DataverseNO, a national, generic repository for open research data; cf. https://info.dataverse.no)
Archiving and sharing (3)
• When will the data be made openly available, and how long will they be stored?
>> (UiT guidelines: as early as possible, no later than the time of publication of article/book)
• How will the data be licensed for reuse?>> (UiT guidelines: as few limitations as possible)>> The standard license in UiT Open Research Data is CC0 (= no restrictions, but cf. addition: “Our Community Norms as well as good scientific practices expect that proper credit is given via citation.”>> Training session about licensing
• Are there other conditions, restrictions or embargo on use?
Ethics and consent
• Special rules for person-identifying and sensitive data, e.g. consent, protection of
participant identity
• Training course in collaboration with NSD
• Ethics Portal UiT: http://uit.no/etikk
• Data protection officer at UiT: Joakim Bakkevold ([email protected])
• Contact: [email protected]
How should a DMP look like?
There are a range of DMP templates, depending on the type of project:
• Project subject to notification to NSD and/or approval by REK: NSD template• Project funded by EU, Horizon 2020: own template, also found in tools like DMPonline• All other projects: UiT template (Word document)
More information and examples of completed DMP• UiT Research data portal (https://uit.no/researchdata)
>> Section Plan your work with research data
For feedback you can send the completed plan to: [email protected]
>> Note! Ask for feedback well in advance of the deadline for submitting the plan!
Follow-up of DMPs at UiT
Guidelines for the follow-up of data management plans at UiTFollow-up embedded in existing routines for different project types:• PhD projects are followed up in connection with routines for the
individual PhD programs.• Externally funded projects are followed up in connection with other
approval routines for externally funded projects.• For other types of research projects, no new decision structures have been
introduced to follow up the research of the individual employees. The UiT management encourages the faculties to follow up their researchers in relevant meetings and arenas so that research data and data management plans are put on the agenda.
More information and help
UiT Research Data Portal:uit.no/researchdata
Email:[email protected]
National node in the Research Data Alliance,RDA in Norway:https://rd-alliance.org/groups/rda-norway(for researchers and support staff)
1. INTRODUCTORY COURSEResearch data management at UiT: An introduction
2. MODULAR COURSESHow to search and cite research dataHow to select an appropriate license
How to structure and document research dataHow to share research data
How to write a data management plan
3. SPECIAL ISSUESQuantitative data (in collaboration with NSD)
Classroom / SkypeNorwegian / English
RDM courses at UiT
Activity: Making your data reusable
(see handout)
Activity: Making your data reusable
Folder: Tromsø class F
Files in the folder:• Notes Dec2018.docx• Notes Nov2018_new.docx• 2018 oktober med notater.docx• Original interview guide.doc• Data-class6.xlsx• Data class F treated.nvp• Data class F treated V2.nvp• Data class E treated together with class F_ÅseØstmo.nvp
Activity: Making your data reusable
Part of the content in the file Data-class6.xlsx
Group Lng date ScoreTroms-1 1 Eng 8 Nov 2018 0.25Troms-2 1 Eng 8 Nov 2018 0.75Troms-3 1 Eng 8 Nov 2018 1.25Troms-1 1 Fra Nov 17 2018 1.75Troms-1 2 Spa 2018-11-22 2.25Troms-2 2 Spa 2018-11-22 2.75
GEN-8001: Take control of your PhD journey
Research data managementPart 2: Best practice recommendations
Data without person-identifying information
UiT Research Data Portal: https://uit.no/researchdataEmail: [email protected]
References
Altman, M, & Crosas, M. (2013). The evolution of data citation: from principles to implementation. IASSIST Quarterly, 2013, 62-70.
UiT The Arctic University of Norway. (n.d.). UiT Research Data Portal. Retrieved 25.02.2020 from https://uit.no/researchdata
UiT The Arctic University of Norway. (n.d.). DataverseNO. Retrieved 25.02.2020 from https://dataverse.no/
UiT The Arctic University of Norway. (n.d.). info: DataverseNO. Retrieved 25.02.2020 from https://info.dataverse.no/
Whyte, A. (2015). Where to keep research data. DCC checklist for evalutating data repositories. V1. Edinburgh: Digital Curation Centre. Retrieved from http://www.dcc.ac.uk/resources/how-guides-checklists/where-keep-research-data/where-keep-research-data
All pictures are taken from Colourbox.com, if not otherwise stated.