SMART FROM THE START
OSU Libraries & Press
Clara Llebot Lorente May 7th, 2020
This is what a dissertation looks like in ScholarsArchive@OSUhttps://ir.library.oregonstate.edu/concern/graduate_thesis_or_dissertations/3484zn94s
OREGON STATE UNIVERSITY 1
Items• Thesis in pdf • Spreadsheets• Movies• Images• Code• Results• …
Related dataset
OREGON STATE UNIVERSITY 2
https://ir.library.oregonstate.edu/concern/datasets/hm50tx752
Related dataset
OREGON STATE UNIVERSITY 3
https://ir.library.oregonstate.edu/concern/datasets/vd66w525h
3 take home messages
1. Share the data that you generate during your research
2. Share your data as an independent data record that can be cited
3. Manage your data well during your research
OREGON STATE UNIVERSITY 4
What about human subjects?
“As open as possible, as closed as
necessary”
H2020 ORD Pilot,
European Comission
OREGON STATE UNIVERSITY 5
Hanna Barczyk for NPR
What we’ll cover
• Why?• Why publish datasets?• Why do it in a separate record?• Why data management?
• How to engage in good data management practices?
• Work with data ethically• Keep data safe• Keep data useful
OREGON STATE UNIVERSITY 6
What is…
OREGON STATE UNIVERSITY 7
?
Data are … and metadata• Samples• Physical collections• Maps• Videos and photographs• Interviews• Model results• …
OREGON STATE UNIVERSITY 8
• Experimental protocols• Code• Context information• Lab notes• …
What is data management?
Actions that contribute to effective storage,
preservation and reuseof data and
documentation throughout the research
lifecycle. OREGON STATE UNIVERSITY 9
• Data/computational science• Database administration• A research method
• What data to collect• How to collect them• How to design an experiment
Why data management?1. Because it is good for YOU
• Increases research efficiency• Saves time• Increased visibility and impact
OREGON STATE UNIVERSITY 10Piwowar & Vision, 2013 peerj.com/articles/175/
Why data management?
OREGON STATE UNIVERSITY 11
2. Because it is good for SCIENCE
• Accelerates scientific breakthrough
• Preservation• Accountability• Reproducibility
Reproducibility and open data
OREGON STATE UNIVERSITY 12Maki Naro thenib.com/repeat-after-me
How to do reproducible science?
OREGON STATE UNIVERSITY 13
Computational reproducibility: code, sofware, hardware
Statistical reproducibility: choice of statistical tests, model parameters, threshold values, etc.
Empirical reproducibility: details about non-computational empirical scientific experiments: open data.
https://doi.org/10.1101/143503
Why data management?
OREGON STATE UNIVERSITY 14
3. Because mandates from Federal agencies and other agencies require it.
• Data Management Plans
How to engage in good data management practices?
In each step of the research cycle…
1. How do we work with data ethically?
2. How do we keep data safe?
3. How do we keep data useful?
Think about Data Management and write a Data Management Plan.
OREGON STATE UNIVERSITY 15Image credits: https://www.dataone.org/data-life-cycle
What does it mean to work with data ethically?
The DataONE data life cycle
OREGON STATE UNIVERSITY 16
1. Data ethics
Protect research subjects and other sensitive data.
OREGON STATE UNIVERSITY 17
Hanna Barczyk for NPR
Research misconduct:• Data falsification• Data fabrication• Data plagiarism
1. Data ethics
What can you do with your data? Can you share it? When? How?
18
Legal framework + formal agreements
Ownership
Researchers are usually NOT the owners of research data
BUTThey can use the data for
career advancement and are responsible for it
PI is the data custodian
Funder requirements
Human Subjects
ResearchBy M
ark
War
ner
Institutional policies
1. Take home message
Talk with your team members about expectations for your project’s data
1. Responsibilities2. Internal data sharing3. External data sharing4. Expectations in the lab/field of
research/good researcher
OREGON STATE UNIVERSITY 19
1. Data ethics: give attribution
OREGON STATE UNIVERSITY 20
Science is collaborative…
How to give credit to all?
Authorship in scientific publications is an (imperfect) way of doing so
1. Data ethics: give attribution
OREGON STATE UNIVERSITY 21
How to engage in good data management practices?
In each step of the research cycle…
1. How do we work with data ethically?
2. How do we keep data safe?
3. How do we keep data useful?
Think about Data Management and write a Data Management Plan.
OREGON STATE UNIVERSITY 22Image credits: https://www.dataone.org/data-life-cycle
2. Keep data safe
• Now: keep backups• After the project:
preserve the data
OREGON STATE UNIVERSITY 23
Would your data survive…?
OREGON STATE UNIVERSITY 24
You start your computer as usual when you get to work and instead of your files, you get the Windows blue screen of death. After going to the IT Service Desk the conclusion is that your computer cannot be repaired.
Would your data survive…?
OREGON STATE UNIVERSITY 25
You arrive to work this morning and realize that somebody got in, and stole all the valuable technology in the office, including all the computers and external hard drives.
Would your data survive…?
OREGON STATE UNIVERSITY 26
University of Southampton, School of Electronics and Computer Science, Southampton, UK, 2005
There has been a fire at the University. Unfortunately, it has affected your office, and the room where your department keeps the shared drive.
Would your data survive…?
OREGON STATE UNIVERSITY 27
You have been working on a deadline for hours and are very tired. You accidentally delete a data file, and don’t realize that until the next day.
Would your data survive…?
OREGON STATE UNIVERSITY 28
There is a glitch in your cloud provider, and several files are automatically destroyed. The cloud company apologizes profusely, but is not able to restore the files.
Backups and storage
OREGON STATE UNIVERSITY 29
Rule of Threes•Primary Local•External Local•External Remote
Original (working)
External local
External remote
2. Keep data safe
• Now: keep backups• After the project:
preserve the data
OREGON STATE UNIVERSITY 30
Preservation of digital content
OREGON STATE UNIVERSITY 31
Traditional content is easy to preserve
Digital content is delicate. Digital preservation is HARD!
CC-BY Quinn Dombrowski
What is a data repository?
• A place where to storedata
• A place to make data publicly available -findable
• A place to preserve your data
OREGON STATE UNIVERSITY 32
Why use a data repository?
Share your data: open
science
OREGON STATE UNIVERSITY 33
Comply with your Data
Management Plan
Give credit to data creators
Preserve your data
Sharing data: repositories
OREGON STATE UNIVERSITY 34
Search domain specific repositories: www.re3data.org
OREGON STATE UNIVERSITY 35
ScholarsArchive@OSU
https://ir.library.oregonstate.edu/
Who is it for?What can you store in it?How to get help?
How to engage in good data management practices?
In each step of the research cycle…
1. How do we work with data ethically?
2. How do we keep data safe?
3. How do we keep data useful?
Think about Data Management and write a Data Management Plan.
OREGON STATE UNIVERSITY 36Image credits: https://www.dataone.org/data-life-cycle
3. Keep data usefulB. Organized
OREGON STATE UNIVERSITY 37
A. Documented -metadata
By Alan Levine
A. Data documentation: metadata
OREGON STATE UNIVERSITY 38
Structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource.
NISO, Understanding Metadatahttp://www.niso.org/publications/press/UnderstandingMetadata.pdf
• Lab notebooks• Questionnaires, codebooks• Software syntax and output files• Info about equipment settings
• Database schema• Methodology reports• Provenance info about sources of derived
or digitized data• Data dictionaries…
What is metadata?
OREGON STATE UNIVERSITY 39
WHO created the data?WHAT is the content of the data?WHEN were the data created?WHERE is it geographically?HOW were the data developed?WHY were the data developed?
Metadata is: Data ‘reporting’
When to generate metadata?
OREGON STATE UNIVERSITY 40
All the time!!!From beginning to end
OREGON STATE UNIVERSITY 41
unstructured structured
readme.txt Metadata standards• Structure to describe data with:
o Common terms to allow consistencyo Common definitions for easier interpretationo Common language for ease of communicationo Common structure to quickly locate information
• In search and retrieval, standards provide:o reliable and predictable format for computer
interpretationo A uniform summary description of the dataset
•Context for the data•Content of the data package•Catalog of data fields
B. Organize your data
OREGON STATE UNIVERSITY 42
”Someone unfamiliar with yourproject should be able to look at your computer files and understand in detail what you didand why”
” Everything you do, you willprobably have to do over again”
Noble, 2009Noble WS (2009) A Quick Guide to Organizing Computational Biology Projects. PLoSComput Biol 5(7): e1000424. doi:10.1371/journal.pcbi.1000424
B. Organize your data: meaningful filenames!
OREGON STATE UNIVERSITY 43
project_instrument_location_YYYY-MM-DD-hhmmss_extra.extIndex/grantConditions
s/n, variable Date: retain order
Other infoAvoid spaces
Avoid % ^ & $ # | : and similarLowercase less software dependent
B. Organize your data: meaningful filenames!
OREGON STATE UNIVERSITY 44
Order by type:• Notes_Gorer_1963-12-15.docx• Notes_MassObs_1955-04-12.docx• Questionnaire_Gorer_1963-12-15.pdf• Questionnaire_MassObs_1955-04-12.pdf
Forced order with numbering:• 01_MassObs_questionnaire_1955-04-12.pdf• 02_MassObs_notes_1955-04-12.docx• 03_Gorer_questionnaire_1963-12-15.pdf• 04_Gorer_notes_1963-12-15.docx
Order by date:• 1955-04-12_notes_MassObs.docx• 1955-04-
12_questionnaire_MassObs.pdf• 1963-12-15_notes_Gorer.docx• 1963-12-15_questionnaire_Gorer.pdf
Order by subject:• Gorer_notes_1963-12-15.docx• Gorer_questionnaire_1963-12-15.pdf• MassObs_notes_1955-04-12.docx• MassObs_questionnaire_1955-04-
12.pdf
Organize your data: file structure
OREGON STATE UNIVERSITY 45
samples.mat
data
Organize your data: file structure
OREGON STATE UNIVERSITY 46
samples.mat
New versiondata
Organize your data: file structure
OREGON STATE UNIVERSITY 47
samples.old.mat
samples.matdata
Organize your data: file structure
OREGON STATE UNIVERSITY 48
samples.old.mat
samples.old2.mat
samples.mat
data
Organize your data: file structure
OREGON STATE UNIVERSITY 49
samples.old.mat
samples.old2.mat
samples.mat
data
In general, renaming or moving files is bad practice:• Makes it harder to reproduce results• Makes it harder to find data later• Breaks scripts and symbolic links.
Organize your data: file structure
OREGON STATE UNIVERSITY 50
samples.mat
samples.V2.mat
samplesFinal.mat
samplesFinalV2.mat
samples_USE_THIS_ONE.mat
Adding new filenameswithoutstructure is notmuch better…
Which one is the most recent??
data
Organize your data: file structure
OREGON STATE UNIVERSITY 51
samples.mat
samples.mat
samples.mat
data
2016-10-15
2016-11-14
2016-09-28
We are not renaming files and it is clear which version is newer.BUT we do not know differences between data sets.
B. Data documentation: readme.txt
OREGON STATE UNIVERSITY 52
readme.txt•Context for the data
•Content of the data package•Catalog of data fields
Data documentationData
readme.txt
data
2016-11-15
2016-09-28
Data
OREGON STATE UNIVERSITY 27
Data documentationMany data files
readme.txt
data
2016-11-15
2016-09-28
readme.txt
Many data files
readme.txt
OREGON STATE UNIVERSITY 27
Need help? Contact us
55
•One on one consultations about research data management.• Data Management Plans• Documentation and organization of data• Data curation for deposit in a repository.• Any aspect of the data life cycle.
•Deposit your data and publications to ScholarsArchive@OSU•Workshops and class visits on data management•Author’s Rights and Intellectual Property Issues
Clara Llebot Lorente | Data Management Specialist
[email protected]://bit.ly/OSUData