Post on 02-Jan-2016
transcript
PART 2: DATA READINESS
C A S R A I C O N F E R E N C ER E C O N N E C T B I G D ATA :
T H E A DVA N C E O F D ATA - D R I V E N D I S C OV E RYO C T O B E R 1 6 , 2 0 1 3
J A N E F RY
Research Data Management: planning and implementation
Moore and Fry, CASRAI 2013 (October 16, 2013)
2
Agenda
Before data collection and processing Planning and organizing
Data collection and processing After data collection and processing
Metadata Your turn
No data expertise needed!
Moore and Fry, CASRAI 2013 (October 16, 2013)
3
Before: why?
Why an RDMP? Essential
For any type of data
Why plan & organize? Journal requirements
be proactive Safety
protect your data Efficiency
easier to write up analyses and reports Quality
ensures high quality when guidelines laid out at beginning
Make a checklist or a template
Moore and Fry, CASRAI 2013 (October 16, 2013)
4
If no RDMP
Potential problems Each type of data has its own ‘peculiarities’
will you remember them after 1, 2, 3, … years What about other researchers
Loss of information Inability to share Inability to replicate Not receive all monies from grant Not as much analysis can be conducted Cannot submit to journals
Moore and Fry, CASRAI 2013 (October 16, 2013)
5
Before: plan and organize
What type of dataHow the data will be collected and processedWhere and how will they be storedHow will they be securedWhere will the back-up be keptHow will confidentiality be maintainedWhat metadata to record
Moore and Fry, CASRAI 2013 (October 16, 2013)
6
Before: type of data?
The type chosen will determine the format to be used for analysis Quantitative
Microdata (.sav) Aggregate data (.xls)
Qualitative (NVivo) Geospatial (Vector and raster data) Digital images (.jpeg) Digital audio (.wav) Digital video (.mp4) Documentation, scripts ( .doc)
Moore and Fry, CASRAI 2013 (October 16, 2013)
7
Before: collection methods?
Depends on type of data Questionnaires Interviews Focus groups Observations Transcripts Newspaper articles Journals Diaries …
Moore and Fry, CASRAI 2013 (October 16, 2013)
8
Before: collection methods (cont’d)
Partially determined by type of data Paper Face-to-face Web Telephone Snail mail E-mail Audio Video …
Moore and Fry, CASRAI 2013 (October 16, 2013)
9
Before: storage?
Where will it be stored Your laptop, pc, Smartphone Your researchers' laptop, pc, Smartphone The shared drive in the office A dropbox
Controlled by what country
What format will be used for storage Proprietary?
Preservation How
Repository Where
Your institution
Moore and Fry, CASRAI 2013 (October 16, 2013)
10
Before: storage strategies
Two different locationsTwo copies (at least)Keep original data with no manipulations
2 copiesWhat to keep
Everything!Use meaningful file names
Set out format to be used Everyone has to use this format
Moore and Fry, CASRAI 2013 (October 16, 2013)
11
Before: security issues?
How to secure data Determine before hand
To prevent unauthorized access Intentional Unintentional
Remote access – yes or no Off-site investigators Off-site research team members
Personal or sensitive data Separate location from the main dataset Limited , controlled access Encrypted
Moore and Fry, CASRAI 2013 (October 16, 2013)
12
Before: back-up?
Where will all information be backed-up If at your institution
How often do they back-up What are their policies for data retention
How often will you back-up When the project is over After a year Monthly Weekly
Moore and Fry, CASRAI 2013 (October 16, 2013)
13
Before: confidentiality?
What procedures will be taken to ensure confidentiality
Data must be anonymised (unless permission has been granted) Not possible to identify any individual Aggregate certain variables
e.g., no low levels of geography Hide outliers by recoding
Record all decisions made Why this decision made How the variable has been recoded
Moore and Fry, CASRAI 2013 (October 16, 2013)
14
Before: confidentiality(cont’d)
Disclosure processing At what point in the data collection/processing Remove direct identifiers
Names Addresses Telephone numbers
Remove indirect identifiers Detailed geographic information Exact occupations Exact dates of events
Birth Marriage Income
Moore and Fry, CASRAI 2013 (October 16, 2013)
15
Before: confidentiality (cont’d)
Legal and ethical obligations to managing and sharing data Ethics approval of your institution National Data Policy (regarding sharing of data)
Canada (FIPPA) UK (ESRC)
How will confidentiality be maintained How to protect the privacy of the respondents How will the confidential information be handled and
managed How to store respondents’ identification, if necessary
Disclosure only if agreed to by respondent
Moore and Fry, CASRAI 2013 (October 16, 2013)
16
Before: metadata?
Why keep metadata Researchers re-use data
Secondary analysis Comparative research Teaching Replicate a study
Requirement of our funders Good research practice
Start documenting at the very beginning of the project
End goal For this data to be replicated, if needed
Moore and Fry, CASRAI 2013 (October 16, 2013)
17
Before: metadata (cont’d)
What to keep - everything! Research design Data collection Data preparation Questionnaires Interviewer instructions Meeting notes among researchers
Details of decisions made Why certain decisions were made
• e.g. if data collection not to be done on a certain date (Easter)
Moore and Fry, CASRAI 2013 (October 16, 2013)
18
Before: metadata (cont’d)
Processes What worked What didn’t work Changes made after pilots conducted
Why they were made Was another pilot conducted after changes made
Any and all changes that were made or not made
Moore and Fry, CASRAI 2013 (October 16, 2013)
19
Before: metadata (cont’d)
Consent of participant (if needed)Disclosure processingNames of everyone involved in the projectSource of all funding
Monetary In kind
Source of any data used that is not from this data collection e.g., postal code conversion file
Moore and Fry, CASRAI 2013 (October 16, 2013)
20
Before: a tip
If contracting out data processing Specify deliverables
User Guide Date work performed Methodology of data cleaning, input, … Details of any new variables
• Reasons for making them• Procedures, …
Name and contact information Copy of questionnaire (if applicable)
Raw data Questionnaires, interviews, …
Example of incomplete deliverable
Moore and Fry, CASRAI 2013 (October 16, 2013)
21
Data collection and processing
Some of the steps are Transcribe Code Enter Check Validate Clean Anonymise
Vary depending on the type of data collectedOne element in common with all types of data
Must record metadata
Moore and Fry, CASRAI 2013 (October 16, 2013)
22
And next
All the decisions have been madeYour checklist/template has been madeThe data have been collected and processed What now?
Complete metadata on the data the documentation
Moore and Fry, CASRAI 2013 (October 16, 2013)
23
After: data
Metadata on data: must be well organized How they were created How they were digitized How they were anonymised Explanation of codes used Explanation of classification scheme(s) used
e.g., occupation Any and all changes that were made Access conditions
e.g., member of your institution Terms of use
e.g., academic or teaching purposes e.g., non-profit
Moore and Fry, CASRAI 2013 (October 16, 2013)
24
After: data (cont’d)
Data metadata File names
Meaningful Set up a system beforehand Make sure everyone sticks to it
Versioning Set up a system beforehand What changes necessitate a new version number
Version 1 to Version 2 • e.g., one of the variables was coded incorrectly, therefore the dataset was
replaced What changes do not necessitate a new version number
Version 1 to Version 1.1• e.g., Something small like a spelling mistake
Moore and Fry, CASRAI 2013 (October 16, 2013)
25
After: data (cont’d)
Transcribing guidelines set up beforehand
Transcribing conventions Instructions Guidelines
Variables Names Labels
Comprehensible Unique
Description Value labels
Comprehensible Complete
Associated question
Moore and Fry, CASRAI 2013 (October 16, 2013)
26
After: data (cont’d)
Recoded variables Why they were needed (e.g., geographic location) Why they were done the way they were (e.g., age) All of the above list under variables
Derived variables Derived from what
Be specific Why was it done All of the above list under variables
Missing values Codes used
Should be consistent Reasons for missing values
Weighting variable(s) Description Formula(s)
Moore and Fry, CASRAI 2013 (October 16, 2013)
27
After: documentation
What to put in? Information for a researcher looking at your dataset
for the first time with no prior knowledge As specific as possible All associated documentation about the research
Moore and Fry, CASRAI 2013 (October 16, 2013)
29
After: documentation (cont’d)
Study background Purpose Time frame Geographic location Creator, principal investigator(s), other investigator(s) Funders Sampling design
Description Size
Any changes that were made
Moore and Fry, CASRAI 2013 (October 16, 2013)
30
After: documentation (cont’d)
Study description Describes all aspects of the data collection and
processing Data collection methodology Data preparation procedure Data validation protocols Instruments used Geographic coverage Temporal coverage Date of file creation Description of codes and classifications used
Moore and Fry, CASRAI 2013 (October 16, 2013)
31
After: documentation (cont’d)
Codebook or user guide Original questionnaire/data collection instrument All interviewer instructions Any documentation describing variables
Original ones Recoded Derived Weight
Include formulas used to construct variables
Moore and Fry, CASRAI 2013 (October 16, 2013)
32
A tip:
Much of the information in the previous slides may seem like common sense You will be tempted not to follow it
No time No facilities to record it Will do it later Minor change, therefore not important enough to mark down Of course, I will remember it!
What if? You forget to mark it down You forget to tell rest of research team
If you follow a checklist, neither you nor your team will be caught short!
Moore and Fry, CASRAI 2013 (October 16, 2013)
33
In sum
In this section you have learned What to do before data collection
Plan and organize Data type, data collection and processing, storage,
security, back-up, confidentiality, metadata To make a checklist or template About data collection and processing (in brief) After data collection and processing
Metadata data, documentation
Research Data Management34
Exercise #2: Data Readiness
Is this data set ready for deposit? Why? Why not? Dataset Title: Attitudes of Pets towards their Owners (October 1998) Documentation available: The following text file:“This survey was conducted by the Pet Researchers of Canada and was analysed by the Acme Research Company. There is no documentation available for this survey. Use basic survey methodology if necessary. There are some interesting results in this survey.” Data available: A microdata file with some variable and value labels.
Example 1: Name of variable: V35Frequency: Yes = 35%, No = 47%
Example 2:Name of Variable: Region of CountryFrequency: 1 = 12%; 2 = 32%; 3 = 35%; 4 = 15%; 5 = 4%
Moore and Fry, CASRAI 2013 (October 16, 2013)
35
Pat MooreAssociate University Librarian: Research, Scholarship and TechnologyCarleton University613.520.2600 x2745pat.moore@carleton.ca
Jane FryData Specialist Carleton University613.520.2600 x1121jane.fry@carleton.ca
Contact Information
Moore and Fry, CASRAI 2013 (October 16, 2013)
36
References
Corti, L “Managing qualitative data”. Datum Workshop, Newcastle, 26 May 2011. Retrieved 7 October 2013 from http://www.library.carleton.ca/sites/default/files/find/data/surveys/pdf_files/corti_dataforlife_20110526.pdf
Fry, J. and Edwards, A.M. (2009). “<odesi> Protocols for accepting data.” Retrieved 7 October 2013 from http://spotdocs.scholarsportal.info/display/odesi/protocols
UK Data Archive. “Create & manage data: Research Data lifecycle”. Retrieved 13 October 2013 from http://data-archive.ac.uk/create-manage/life-cycle
Stephenson, L. “Data management for advanced research”. Presentation given 28 March 2008. UCLA Social Science Data Archive, Unpublished.