Date post: | 25-May-2015 |
Category: |
Technology |
Upload: | jackie-wirz-phd |
View: | 281 times |
Download: | 4 times |
Data management.
Nicole Vasilevsky, NCNM, OHSU
Jackie Wirz, OHSU
Melissa Haendel, OHSU
Outline
• Introduction
• Why do we need good data
management?
• Good data management
• Databases and tools
• Sharing your data
Who are we?
• Nicole Vasilevsky, PhD
– Assistant Professor, Helfgott Research Institute, NCNM
– Project Manager, Ontology Development Group, OHSU
• Jackie Wirz, PhD
– Assistant Professor, Bioinformation Specialist, OHSU
library
• Melissa Haendel, PhD
– Assistant Professor, Department Head, Ontology
Development Group, OHSU
What does data mean to you?
Do you have any training in data
management?
Do you know what
metadata is?
a. Philosophy
b. describes data
c. dating site
d. data
What is data?
• Clinical data
• Experimental data
• School related data
• Personal data
• Social data
So much data
Why?
Personal organization
Credit where credit is due
Reproducibility of science and
medicine
Accelerates scientific and clinical discovery
Efficiency
Do you get frustrated with any of the
following in your personal or professional
life?a. Storing data
b. Backing up data
c. Analyzing/manipulating data
d. Finding data produced by other researchers/clinicians
e. Ensuring data are secure
f. Making data accessible to other researchers
g. Controlling access to data
h. Tracking updates to data (ie versioning)
i. Creating metadata (ie describing the data to be more useful at
a later time or by others)
j. Protecting intellectual property rights
k. Ensuring appropriate professional credit/citation is given to
data sets/generated
http://davidmichaelangelosilva.wordpress.com/2012/01/29/organize-your-messy-desktop-with-fences/
Messy Desktop?
Which of the following do you do? a. Save copies of data on a disk, USB drive, tape, or
computer hard drive
b. Save copies of data on a local server
c. Save copies of data on a central campus server
d. Save copies of data on a web based or cloud server
e. Store data in a repository or archives
f. Automatically backup files
g. Manually generate backup
h. Restrict access to files
Credit where credit is due
Data collection & Analysis
Authoring
Storage, Archiving, & Preservation
Publication & Dissemination
The scholarly
communication cycle
Reproducibility of science• Lack of information
makes it difficult to reproduce experiments
• Retraction rates are on the rise
• Difficulty identifying resources in the published literature
Cokol et al. EMBO reports (2008) 9, 2
0%
25%
50%
75%
100%
Sharing can be advantageous
http://www.flickr.com/photos/eltonl/107582334/sizes/l/in/photostream/
Why share your data?
• Data sharing mandates– NIH public
access policy
– NIH/NSF data sharing plan for new applications
• Further science and and medicine
• Build collaborations
• Enable new discoveries with your data
• Can be required at time of publication
Efficiency
http://hbr.org/2012/10/big-data-the-management-revolution
https://upload.wikimedia.org/wikipedia/commons/b/ba/HMS_Surprise_at_sunset_with_airplane.jpg
How?
• File naming and data storage
• Metadata
• Controlled vocabularies and
Ontologies
• Databases and Tools
• Data accessibility
File naming
Informative file names
Will I remember what this file is in a month from now?
Naming conventions
Project_instrument_location_YYYYMMDDhhmmss_extra.ext
Index/grant conditions Leading zero!
s/n, variable Retain order
Directory Structure
Sticking with a directory structure can
be hardFiles:SPARC presentationCTSAconnect presentationMonarch presentation
Presentations
SPARC CTSAconnect
Monarch
VersioningDataManagement_SPARC_050313_final_NV
• Save a copy of every version of a data file
• Follow a file naming convention
• Version control software
– Dropbox
– Google docs
– GIT
– SMART SVN
Dropbox
www.dropbox.com
Google docs
Remember to backup your data!
• Recommended to back up three
copies!
– 1 on your local workstation
– 1 local/remove, such as external hard drive
– 1 remote, such as on a cloud server*
*Depending on the type of data, as cloud servers are not always secure
http://libraries.mit.edu/guides/subjects/data-management/Managing%20Research%20Data%20101.pdf
Organizing your IRB application
Created by Heather Schiffke
See:http://libguides.ohsu.edu/data
File renaming applications
• Bulk Rename Utility (Windows)
• Renamer (Mac)
• PSRenamer
Metadata
What is Metadata?
TitleAuthorCall numberPublisherISBN
File name File type
Who created the data
Title
Date created
Using structured phenotype data to identify genetic basis of disease
Metadata standards:Controlled vocabularies and
ontologies
Controlled vocabularies
MeSH
MeSH
acetominophen
What is an Ontology?
1. Hierarchical terms are defined textually and logically
2. Relationships between the terms are defined
3. Expressed in a language that can be reasoned across by computers
4. Data can be reused and can be easily linked together
Commonly Used Ontologies
• Gene Ontology
• Linnaean Taxonomy
• SNOMED
Why are CVs and Ontologies useful?
• Can be used to structure your
metadata
• Are often used to structure
information in databases
Structured data helps with searching
Craigslist search: Chaise
Craigslist matches on strings only
Craigslist search: Fainting couch
Structured data helps with searching
PubMed indexes articles with MeSH Terms
In Summary: Structured Metadata = good
How can I create structured metadata?
http://www.flickr.com/photos/san_drino/1454922072/
and Tools…(to make your life easier)
(s)
http://farm4.static.flickr.com/3560/3332644561_c9d5041d02.jpg
Data Management tools and repositories
• Purpose: Software where you can organize, store and/or share data
• Often contain metadata to assist with data entry and create structured data
Tools for data management
Data Sharing Repositories
http://www.nlm.nih.gov/NIHbmic/nih_data_sharing_repositories.html
Repositories use Unique IDs
• Document Object Identifier (DOI)
• Example: DOIs for publications
– doi: 10.1371/journal.pbio.1001339
• Unique resource identifier (URI)
• A URI will resolve to a single location on the web
• URIs for people
People Disambiguation
• Example: • John L Campbell, Research Ecologist, Oregon State University, Corvallis
OR• John L Campbell, Research Ecologist, Center for Research on
Ecosystem Change, Durham, NC
Tools for personal data management
• Google drive
• Dropbox
• Evernote
• Task Paper
• Diigo- bookmarking websites
• Mendeley, EndNote, Zotero- citation manager
• Sound Gecko
http://blogs.scientificamerican.com/information-culture/2012/12/10/managing-personal-knowledge-data-and-information/
URLs to resources
Go to:
http://
libguides.ohsu.edu/data
Data Sharing and Management
Snafu
in 3 short acts