Date post: | 24-Jan-2015 |
Category: |
Documents |
Upload: | garethknight |
View: | 1,018 times |
Download: | 0 times |
Data Managementfor Librarians:
An Introduction
February 19th 2013
Gareth KnightManager
RDM Support Service
May originate from various sources: Primary and/or secondary
May contain different content:Quantitative and/or qualitative
May be expressed in different forms:Datasets, still images, audio‐video, audio recordings, interactive resources
May be held in a number of variations:Raw, cleaned, anonymised/pseudomised, analysed
May be encoded in different formats:MS Excel, TIFF, MPEG2, STATA, FoxPro
What is Data?
What type of data do you have at home?
“Data are facts, observations or experiences on which an argument, theory or test is based. Data may be numerical, descriptive or visual. Data may be raw or
analysed, experimental or observational.“http://research.unimelb.edu.au/integrity/conduct/data/review
Data in the Research Lifecycle
Brainstorm
Develop Proposal
Plan Project
Perform Research
Write‐up Results
Finalise & submit
Data in the Research Lifecycle
Brainstorm
Develop Proposal
Plan Project
Perform Research
Write‐up Results
Finalise & submit
Develop Proposal
Produce Data Management
Plan
Data in the Research LifecycleBrainstorm
Develop Proposal
Plan Project
Perform Research
Write‐up Results
Finalise & submit
Perform Research
Create / Reuse
Analyse
Store
Describe
Share
Data in the Research LifecycleBrainstorm
Develop Proposal
Plan Project
Perform Research
Write‐up Results
Finalise & submit
Perform Research
Create / Reuse
Analyse
Store
Describe
Share
Share
Finalise & submit
Archive
What is Data Management?1. Plan
• Determine requirements• Identify risks & opportunities• Decide approach
2. Implement3. Monitor
• Evaluate approach• Change approach/perform
corrective action
4. Evaluate• Is it Fit for purpose?• What additional action is
needed?
‘Benign neglect’ and Poorly‐made decisions in short‐term will have long‐term implications
Short-term decisionswith long-term implications
Software products File formats & standards
Data organisation & labelling Quality Controls
Why does data need to be managed? Ensure data can be located Enable analysis
Ability to understand for current and future need
Interesting paper. Where’s
the data?
Enable sharing & validation
Why does data need to be managed? Ensure data can be located Enable analysis
Ability to understand for current and future need
Interesting paper. Where’s
the data?
Enable sharing & validation
Comply with Funder & School requirements
Researcher ChallengesIssues/challenges encountered when creating, managing,
and sharing research data (web survey results)
Other challenges• Database creation & management• Storage of physical questionnaires• Lack of time• Software instability (particularly
NVivo)• Ability to enter & access data at
different locations
Response TypeMultiple choice checkbox + free
text for other challenges
Training NeedsInterest in training on topics related to data management (web survey results)
Note:Graph omits percentages for other responses
(None, slight, moderate, no opinion)
RDM Support Service
Location of Library staff
RDM Support Service
Location of Library staff
Role of Library staff
Provide first point of contact
Help researchers to express requirements & needs
Direct to potential solution (staff, website)
Contribute to training activities
Incorporate data considerations into teaching
Data Access Over Timedigital vs. analogue
data
=
informationcontent
computer
+
OS
+ +
application
“traditionally, preserving things meant keeping them unchanged; however … if we hold on to digital information without modifications, accessing the information will become increasingly more difficult, if not impossible.”Su‐Shing Chen, 2001
Change in Process over Time
operating system
software applicationhardware information
content
Intel PC, 2000
Mac laptop, 2006
X64 Ubuntu laptop, 2010
Change in Process over Time
operating system
software applicationhardware information
content
Intel PC, 2000
Mac laptop, 2006
X64 Ubuntu laptop, 2010
Task• Select two of the following problems when managing digital data:
1. Difficulty locating data2. Difficulty accessing media3. Difficulty rendering data in an understandable form4. Difficulty recreating data as originally intended5. Difficulty understanding information content6. Uncertain provenance
Consider the following questions:a. In what circumstances will the chosen problem occur?
b. What consequences may occur if the problem occurs (e.g. financial implications)
c. How could you ensure that the problem doesn’t occur?d. What could you do to resolve the problem after it has
occurred? (Can direct to someone for help)
1. Difficulty Locating Data
“I created some data 5 years ago. Where is it?”“I’ve lost my original disk. Do I have the data elsewhere?
Preventative:• Copy data to several storage devices – increase likelihood
of finding it
Post event:• Find better discovery software?• Attempt to recreate content?
Problem
Loss of storage mediaLots of data stored in many locations
Vague filenames make it difficult to locate
Scenarios & Reasons
(Potential) Solutions
2. Difficulty accessing Media
“How do I access this old media?”“Why can’t I read this disk?”
Preventative:• Copy data to several storage devices• Transfer data to new storage media on obsolescence / every 3 years• Deposit data into a data archive and/or copy to server
Post event:• Data recovery software
Problem
Media obsolescencePhysical deterioration & failure
Scenario & Reasons
(Potential) Solutions
Potential Storage LocationsPros:Cheap, high capacity storage, fast accessCons:Lack of support; potential for theft, loss, or damage
Pros:Automatic monitoring & backup, multiple redundancy, remote access, secure (if required)Cons:Limited space allocation, Not always accessible overseas
Pros:Automated backup, accessible in diff. countries (usually)Cons:Security concerns, ownership concerns, services can close account at any time
Local machine & Storage
Academic Storage Systems
Third party service providers
http://www.flickr.com/photos/m0n0/4479450696/
Recommended
3. Difficulty Rendering Data
“How can I view data?“Where do I find software to access my data?”
Preventative:• Transform data to new formats (format conversion strategy)• Maintain original machine and software to access content (computer museum)
Post event:• Track down original software product• Emulate original environment (emulation/virtualisation)
Problem
Software obsolescenceNew software use different decoding method
Scenarios & Reasons
(Potential) Solutions
Choosing File Formats
DisseminationPreservationCreation
When working with multiple copies, decide which is the master copy
Content Type Preferred Format Acceptable AlternativesDocuments Rich Text Format Microsoft DocX
Open Document Format
Still Images TIFFJPEG 2000 (uncompressed)
PNG,RAW
Audio Wav formatAIFFFLAC
MP3
AudioVideo MPEG2,MPEG4
4. Difficulty Maintaining Authenticity
“Why does my data look different?”
Preventative:• Determine significant properties that should be maintained• Maintain original machine and software to access content (computer museum)
Post event:• Emulate original environment (emulation/virtualisation)
Problem
New version of software application use different decoding method
Different software application in use
Scenarios & Reasons
(Potential) Solutions
5. Difficulty Understanding Content
“Where was this information created?Why did the creator make this decision?
“What does this value mean?”“How does this data relate to other content?”
Problem
Memory fails – cannot remember decisions madeDisorganised and poorly labelled data
Lack of documentation
Scenarios & Reasons
Does a Rosetta stone existfor your data?
(Potential) Solutions• Organise data (Chronology, Experiment type,
location, content type)• Adopt labelling conventions• Documentation
Filename conventions• Consider the elements that will help you to organise and locate
content– E.g. Participant ID, site of data collection,date of data collection
• Consider how data files and directories may be organised & sorted– 001, 002, 003, 004, can be used for sequential files– YYYY‐MM‐DD (2012‐12‐04) useful for organising by date (use year first)
• Identify different versions of content in filename (and in content)– Creation date (YY‐MM‐DD)– Version/draft number
• Consider how your filenames will look to others– Avoid spaces ‐ ‘My file.pdf’ becomes ‘My%20file.pdf’ on the web– Avoid capitalisation ‐ Alters file sorting & CAUSES HEADACHES!
Golden Rule: Be Consistent
Data Documentation
1. What is the context of creation?• Why did you create it? For what purpose?• What methodology did you use? What assumptions were made?• Who is the target audience?
2. Collection and set of files:• What information does each file contain?• When was it created?• By whom?• What actions were performed?• How does the data contained in the collection relate to each other?
3. Individual components• What is the meaning of this word/column/row, etc.?• How are these items measured?• What are the boundaries of the measurement?
What would someone want to know if theywere looking at your data the first time?
6. Uncertain Provenance
1. “When was the data created and/or modified?”2. “Who created/modified the data?”3. “Why was it created and/or modified?
Problem
• Lack/Loss of trust in information content• Reluctance to use information content
Scenarios & Reasons
(Potential) SolutionsPreventative:• Limit update to authorised users only• Store change history• Keep each versionPost event:• Locate data creator & editor?
Things to RecommendAdvise researchers to:
1. Choose an appropriate storage location and create backups
2. Organise data in a consistent and logical manner
3. Document the data and information content (as well as structure)
4. Consider how you will ensure that information can be accessed in the long‐term
5. Consider potential for data sharing and ensure it is performed with consideration of ethics
A Few Good References• Digital Curation Centre
http://www.dcc.ac.uk/resources
• MANTRA – Data Management training for PhD studentshttp://datalib.edina.ac.uk/mantra/
• UK Data Archive – Managing and Sharing Datahttp://www.data‐archive.ac.uk/media/2894/managingsharing.pdf
• Cambridge University – RDM Guidancehttp://www.lib.cam.ac.uk/dataman/index.html
• Australia National Data Servicehttp://ands.org.au/resource/data‐management‐planning.html
• LSHTM Research Data Management Support Service• http://blogs.lshtm.ac.uk/rdmss/