+ All Categories
Home > Documents > Data quality control, Data formats and preservation, Versioning and authenticity, Data storage...

Data quality control, Data formats and preservation, Versioning and authenticity, Data storage...

Date post: 11-Jan-2016
Category:
Upload: phoebe-obrien
View: 215 times
Download: 0 times
Share this document with a friend
Popular Tags:
13
Data quality control, Data formats and preservation, Versioning and authenticity, Data storage Managing research data well workshop London, 30 June 2009 Manchester, 1 July 2009
Transcript
Page 1: Data quality control, Data formats and preservation, Versioning and authenticity, Data storage Managing research data well workshop London, 30 June 2009.

Data quality control,Data formats and preservation,

Versioning and authenticity,Data storage

Managing research data well workshop London, 30 June 2009

Manchester, 1 July 2009

Page 2: Data quality control, Data formats and preservation, Versioning and authenticity, Data storage Managing research data well workshop London, 30 June 2009.

2

Good data management

• good research• high quality data• needs to be planned • specific for purpose • data can be understood and used now and in future• data can then be shared and re-used

Page 3: Data quality control, Data formats and preservation, Versioning and authenticity, Data storage Managing research data well workshop London, 30 June 2009.

3

Can you understand / use these data?

SrvMthdDraft.doc

SrvMthdFinal.doc

SrvMthdLastOne.doc

SrvMthdRealVersion.doc

Page 4: Data quality control, Data formats and preservation, Versioning and authenticity, Data storage Managing research data well workshop London, 30 June 2009.

4

Quality control

Data quality control at various stages:

• data collection – e.g. instrument calibration; expert opinion; multiple measurements;

computer assisted interviews

• data entry, digitisation, transcription and coding - standardised and consistent procedures

– e.g. set up validation rules for data entry; use input masks; detailed variable labelling; missing value coding; use controlled vocabularies or choice lists; best structure to organise data and data files

• data checking and verifying - automated and/or manual– e.g. double entry; check for out-of-range values; apply random sample

validation; statistical analyses (descriptives, frequencies, means, range, clustering) to detect errors or find anomalous values; verify data completeness

Page 5: Data quality control, Data formats and preservation, Versioning and authenticity, Data storage Managing research data well workshop London, 30 June 2009.

5

Data formats

• choice of software format for digital data:– planned data analyses– software availability– hardware used– discipline specific standards and customs

• digital data software dependent

• digital data endangered by obsolescence of software/hardware

• best formats for long-term preservation - standard formats, interchangeable formats, open formats

– e.g. tab-delimited; comma-delimited (CSV); ASCII; OpenDocument format; SPSS portable; XML

Page 6: Data quality control, Data formats and preservation, Versioning and authenticity, Data storage Managing research data well workshop London, 30 June 2009.

6

Data format conversions

• convert data for preservation or back-up, e.g. export, save as• beware of conversion errors:

– loss of internal metadata> e.g. convert MS Access to tab-delimited tables

– loss of editing, formatting, formulae> e.g. convert MS Word to RTF

– truncation or loss of data > e.g. string variables lost in SPSS – STATA conversion

• check for errors and changes after conversion

Example 1: MS Excel to tab-delimitedExample 2: Word to XMLExample 3: Proprietary audio file (DVF) to WAV

Page 7: Data quality control, Data formats and preservation, Versioning and authenticity, Data storage Managing research data well workshop London, 30 June 2009.

7

MS Excel format

Tab–delimited text format

Page 8: Data quality control, Data formats and preservation, Versioning and authenticity, Data storage Managing research data well workshop London, 30 June 2009.

8

Version control• keep track of different copies or versions of data files

• which methods:› single site vs. across locations› single vs. multiple users› different versions to be stored vs. files to be synchronised

• single user of data files:› file naming – unique file names with date or version number (avoid spaces!)

e.g. FoodInterview_1_draft; FoodInterview_1_final; HealthTests_06-04-2008; BGHSurveyProcedures_00_04

› version control table or file history within or alongside data file› version control facility within software, e.g. MS Windows software

• multiple users of data files› same as above› control rights to file editing: read/write permissions, e.g. Windows Explorer› versioning/file sharing software: check files out/in, e.g. SVN, VSS, Google Docs, Amazon S3› manual merging of multiple entries/edits

• synchronise files, e.g. MS SyncToy software

Page 9: Data quality control, Data formats and preservation, Versioning and authenticity, Data storage Managing research data well workshop London, 30 June 2009.

9

Authenticity of data

• master files• assign responsibility for master files• record changes to master files

Page 10: Data quality control, Data formats and preservation, Versioning and authenticity, Data storage Managing research data well workshop London, 30 June 2009.

10

Data storage

• digital storage media unreliable• file formats and physical storage media ultimately become obsolete • optical (CD, DVD) and magnetic media (hard drive, tapes) vulnerable and

subject to physical degradation

Best practice:• use data formats with long-term readability• storage strategy with at least two different forms of storage• copy/migrate data files to new media between two and five years after first

created • check data integrity of stored data files at regular intervals (checksum)• know your back-up strategy: institutional/personal; network server/PC/laptop• maintain original copy, external local copy and external remote copy• test file recovery• Data Protection Act and data back-up – may require minimal data copies for

personal data; secure storage

Page 11: Data quality control, Data formats and preservation, Versioning and authenticity, Data storage Managing research data well workshop London, 30 June 2009.

11

Example: data storage and preservation at UKDA

preservation copy (UKDA) shadow copy (UKDA) dissemination copy to reduce

load on main system near-site online copy (on

campus) off-site online copy tape-based offline copy

(UKDA)

Multi-copy, multi-storage media and multi version resilience:

scheduled nightly

robotic 3-

monthly

Page 12: Data quality control, Data formats and preservation, Versioning and authenticity, Data storage Managing research data well workshop London, 30 June 2009.

12

Good data management practice

• plan data management early• assign roles and responsibilities• design data management according to needs and

purpose of research • data management throughout research

Page 13: Data quality control, Data formats and preservation, Versioning and authenticity, Data storage Managing research data well workshop London, 30 June 2009.

13

Resources

• ESDS (2008). Guide to good practice: micro data handling and security. http://www.esds.ac.uk/news/publications/microDataHandlingandSecurity.pdf

• Finch, L. & Webster, J. (2008). Caring for CDs and DVDs. NPO Preservation Guidance. Preservation in Practice Series. London, National Preservation Office. Available at http://www.bl.uk/npo/pdf/cd.pdf

• UK Data Archive (2009). Manage and Share Data. http://www.data-archive.ac.uk/sharing/

See: http://www.data-archive.ac.uk/sharing/furtherstorage.asp


Recommended