+ All Categories
Home > Documents > PART 2: DATA READINESS CASRAI CONFERENCE RECONNECT BIG DATA: THE ADVANCE OF DATA-DRIVEN DISCOVERY...

PART 2: DATA READINESS CASRAI CONFERENCE RECONNECT BIG DATA: THE ADVANCE OF DATA-DRIVEN DISCOVERY...

Date post: 02-Jan-2016
Category:
Upload: clarissa-anderson
View: 213 times
Download: 0 times
Share this document with a friend
Popular Tags:
36
PART 2: DATA READINESS CASRAI CONFERENCE RECONNECT BIG DATA: THE ADVANCE OF DATA-DRIVEN DISCOVERY OCTOBER 16, 2013 JANE FRY Research Data Management: planning and implementation
Transcript

PART 2: DATA READINESS

C A S R A I C O N F E R E N C ER E C O N N E C T B I G D ATA :

T H E A DVA N C E O F D ATA - D R I V E N D I S C OV E RYO C T O B E R 1 6 , 2 0 1 3

J A N E F RY

Research Data Management: planning and implementation

Moore and Fry, CASRAI 2013 (October 16, 2013)

2

Agenda

Before data collection and processing Planning and organizing

Data collection and processing After data collection and processing

Metadata Your turn

No data expertise needed!

Moore and Fry, CASRAI 2013 (October 16, 2013)

3

Before: why?

Why an RDMP? Essential

For any type of data

Why plan & organize? Journal requirements

be proactive Safety

protect your data Efficiency

easier to write up analyses and reports Quality

ensures high quality when guidelines laid out at beginning

Make a checklist or a template

Moore and Fry, CASRAI 2013 (October 16, 2013)

4

If no RDMP

Potential problems Each type of data has its own ‘peculiarities’

will you remember them after 1, 2, 3, … years What about other researchers

Loss of information Inability to share Inability to replicate Not receive all monies from grant Not as much analysis can be conducted Cannot submit to journals

Moore and Fry, CASRAI 2013 (October 16, 2013)

5

Before: plan and organize

What type of dataHow the data will be collected and processedWhere and how will they be storedHow will they be securedWhere will the back-up be keptHow will confidentiality be maintainedWhat metadata to record

Moore and Fry, CASRAI 2013 (October 16, 2013)

6

Before: type of data?

The type chosen will determine the format to be used for analysis Quantitative

Microdata (.sav) Aggregate data (.xls)

Qualitative (NVivo) Geospatial (Vector and raster data) Digital images (.jpeg) Digital audio (.wav) Digital video (.mp4) Documentation, scripts ( .doc)

Moore and Fry, CASRAI 2013 (October 16, 2013)

7

Before: collection methods?

Depends on type of data Questionnaires Interviews Focus groups Observations Transcripts Newspaper articles Journals Diaries …

Moore and Fry, CASRAI 2013 (October 16, 2013)

8

Before: collection methods (cont’d)

Partially determined by type of data Paper Face-to-face Web Telephone Snail mail E-mail Audio Video …

Moore and Fry, CASRAI 2013 (October 16, 2013)

9

Before: storage?

Where will it be stored Your laptop, pc, Smartphone Your researchers' laptop, pc, Smartphone The shared drive in the office A dropbox

Controlled by what country

What format will be used for storage Proprietary?

Preservation How

Repository Where

Your institution

Moore and Fry, CASRAI 2013 (October 16, 2013)

10

Before: storage strategies

Two different locationsTwo copies (at least)Keep original data with no manipulations

2 copiesWhat to keep

Everything!Use meaningful file names

Set out format to be used Everyone has to use this format

Moore and Fry, CASRAI 2013 (October 16, 2013)

11

Before: security issues?

How to secure data Determine before hand

To prevent unauthorized access Intentional Unintentional

Remote access – yes or no Off-site investigators Off-site research team members

Personal or sensitive data Separate location from the main dataset Limited , controlled access Encrypted

Moore and Fry, CASRAI 2013 (October 16, 2013)

12

Before: back-up?

Where will all information be backed-up If at your institution

How often do they back-up What are their policies for data retention

How often will you back-up When the project is over After a year Monthly Weekly

Moore and Fry, CASRAI 2013 (October 16, 2013)

13

Before: confidentiality?

What procedures will be taken to ensure confidentiality

Data must be anonymised (unless permission has been granted) Not possible to identify any individual Aggregate certain variables

e.g., no low levels of geography Hide outliers by recoding

Record all decisions made Why this decision made How the variable has been recoded

Moore and Fry, CASRAI 2013 (October 16, 2013)

14

Before: confidentiality(cont’d)

Disclosure processing At what point in the data collection/processing Remove direct identifiers

Names Addresses Telephone numbers

Remove indirect identifiers Detailed geographic information Exact occupations Exact dates of events

Birth Marriage Income

Moore and Fry, CASRAI 2013 (October 16, 2013)

15

Before: confidentiality (cont’d)

Legal and ethical obligations to managing and sharing data Ethics approval of your institution National Data Policy (regarding sharing of data)

Canada (FIPPA) UK (ESRC)

How will confidentiality be maintained How to protect the privacy of the respondents How will the confidential information be handled and

managed How to store respondents’ identification, if necessary

Disclosure only if agreed to by respondent

Moore and Fry, CASRAI 2013 (October 16, 2013)

16

Before: metadata?

Why keep metadata Researchers re-use data

Secondary analysis Comparative research Teaching Replicate a study

Requirement of our funders Good research practice

Start documenting at the very beginning of the project

End goal For this data to be replicated, if needed

Moore and Fry, CASRAI 2013 (October 16, 2013)

17

Before: metadata (cont’d)

What to keep - everything! Research design Data collection Data preparation Questionnaires Interviewer instructions Meeting notes among researchers

Details of decisions made Why certain decisions were made

• e.g. if data collection not to be done on a certain date (Easter)

Moore and Fry, CASRAI 2013 (October 16, 2013)

18

Before: metadata (cont’d)

Processes What worked What didn’t work Changes made after pilots conducted

Why they were made Was another pilot conducted after changes made

Any and all changes that were made or not made

Moore and Fry, CASRAI 2013 (October 16, 2013)

19

Before: metadata (cont’d)

Consent of participant (if needed)Disclosure processingNames of everyone involved in the projectSource of all funding

Monetary In kind

Source of any data used that is not from this data collection e.g., postal code conversion file

Moore and Fry, CASRAI 2013 (October 16, 2013)

20

Before: a tip

If contracting out data processing Specify deliverables

User Guide Date work performed Methodology of data cleaning, input, … Details of any new variables

• Reasons for making them• Procedures, …

Name and contact information Copy of questionnaire (if applicable)

Raw data Questionnaires, interviews, …

Example of incomplete deliverable

Moore and Fry, CASRAI 2013 (October 16, 2013)

21

Data collection and processing

Some of the steps are Transcribe Code Enter Check Validate Clean Anonymise

Vary depending on the type of data collectedOne element in common with all types of data

Must record metadata

Moore and Fry, CASRAI 2013 (October 16, 2013)

22

And next

All the decisions have been madeYour checklist/template has been madeThe data have been collected and processed What now?

Complete metadata on the data the documentation

Moore and Fry, CASRAI 2013 (October 16, 2013)

23

After: data

Metadata on data: must be well organized How they were created How they were digitized How they were anonymised Explanation of codes used Explanation of classification scheme(s) used

e.g., occupation Any and all changes that were made Access conditions

e.g., member of your institution Terms of use

e.g., academic or teaching purposes e.g., non-profit

Moore and Fry, CASRAI 2013 (October 16, 2013)

24

After: data (cont’d)

Data metadata File names

Meaningful Set up a system beforehand Make sure everyone sticks to it

Versioning Set up a system beforehand What changes necessitate a new version number

Version 1 to Version 2 • e.g., one of the variables was coded incorrectly, therefore the dataset was

replaced What changes do not necessitate a new version number

Version 1 to Version 1.1• e.g., Something small like a spelling mistake

Moore and Fry, CASRAI 2013 (October 16, 2013)

25

After: data (cont’d)

Transcribing guidelines set up beforehand

Transcribing conventions Instructions Guidelines

Variables Names Labels

Comprehensible Unique

Description Value labels

Comprehensible Complete

Associated question

Moore and Fry, CASRAI 2013 (October 16, 2013)

26

After: data (cont’d)

Recoded variables Why they were needed (e.g., geographic location) Why they were done the way they were (e.g., age) All of the above list under variables

Derived variables Derived from what

Be specific Why was it done All of the above list under variables

Missing values Codes used

Should be consistent Reasons for missing values

Weighting variable(s) Description Formula(s)

Moore and Fry, CASRAI 2013 (October 16, 2013)

27

After: documentation

What to put in? Information for a researcher looking at your dataset

for the first time with no prior knowledge As specific as possible All associated documentation about the research

Moore and Fry, CASRAI 2013 (October 16, 2013)

28

Moore and Fry, CASRAI 2013 (October 16, 2013)

29

After: documentation (cont’d)

Study background Purpose Time frame Geographic location Creator, principal investigator(s), other investigator(s) Funders Sampling design

Description Size

Any changes that were made

Moore and Fry, CASRAI 2013 (October 16, 2013)

30

After: documentation (cont’d)

Study description Describes all aspects of the data collection and

processing Data collection methodology Data preparation procedure Data validation protocols Instruments used Geographic coverage Temporal coverage Date of file creation Description of codes and classifications used

Moore and Fry, CASRAI 2013 (October 16, 2013)

31

After: documentation (cont’d)

Codebook or user guide Original questionnaire/data collection instrument All interviewer instructions Any documentation describing variables

Original ones Recoded Derived Weight

Include formulas used to construct variables

Moore and Fry, CASRAI 2013 (October 16, 2013)

32

A tip:

Much of the information in the previous slides may seem like common sense You will be tempted not to follow it

No time No facilities to record it Will do it later Minor change, therefore not important enough to mark down Of course, I will remember it!

What if? You forget to mark it down You forget to tell rest of research team

If you follow a checklist, neither you nor your team will be caught short!

Moore and Fry, CASRAI 2013 (October 16, 2013)

33

In sum

In this section you have learned What to do before data collection

Plan and organize Data type, data collection and processing, storage,

security, back-up, confidentiality, metadata To make a checklist or template About data collection and processing (in brief) After data collection and processing

Metadata data, documentation

Research Data Management34

Exercise #2: Data Readiness

Is this data set ready for deposit? Why? Why not? Dataset Title: Attitudes of Pets towards their Owners (October 1998) Documentation available: The following text file:“This survey was conducted by the Pet Researchers of Canada and was analysed by the Acme Research Company. There is no documentation available for this survey. Use basic survey methodology if necessary. There are some interesting results in this survey.” Data available: A microdata file with some variable and value labels.

Example 1: Name of variable: V35Frequency: Yes = 35%, No = 47%

Example 2:Name of Variable: Region of CountryFrequency: 1 = 12%; 2 = 32%; 3 = 35%; 4 = 15%; 5 = 4%

Moore and Fry, CASRAI 2013 (October 16, 2013)

35

Pat MooreAssociate University Librarian: Research, Scholarship and TechnologyCarleton University613.520.2600 [email protected]

Jane FryData Specialist Carleton University613.520.2600 [email protected]

Contact Information

Moore and Fry, CASRAI 2013 (October 16, 2013)

36

References

Corti, L “Managing qualitative data”. Datum Workshop, Newcastle, 26 May 2011. Retrieved 7 October 2013 from http://www.library.carleton.ca/sites/default/files/find/data/surveys/pdf_files/corti_dataforlife_20110526.pdf

Fry, J. and Edwards, A.M. (2009). “<odesi> Protocols for accepting data.” Retrieved 7 October 2013 from http://spotdocs.scholarsportal.info/display/odesi/protocols

UK Data Archive. “Create & manage data: Research Data lifecycle”. Retrieved 13 October 2013 from http://data-archive.ac.uk/create-manage/life-cycle

Stephenson, L. “Data management for advanced research”. Presentation given 28 March 2008. UCLA Social Science Data Archive, Unpublished.


Recommended