Date post: | 11-Jan-2016 |
Category: |
Documents |
Upload: | phoebe-obrien |
View: | 215 times |
Download: | 0 times |
Data quality control,Data formats and preservation,
Versioning and authenticity,Data storage
Managing research data well workshop London, 30 June 2009
Manchester, 1 July 2009
2
Good data management
• good research• high quality data• needs to be planned • specific for purpose • data can be understood and used now and in future• data can then be shared and re-used
3
Can you understand / use these data?
SrvMthdDraft.doc
SrvMthdFinal.doc
SrvMthdLastOne.doc
SrvMthdRealVersion.doc
4
Quality control
Data quality control at various stages:
• data collection – e.g. instrument calibration; expert opinion; multiple measurements;
computer assisted interviews
• data entry, digitisation, transcription and coding - standardised and consistent procedures
– e.g. set up validation rules for data entry; use input masks; detailed variable labelling; missing value coding; use controlled vocabularies or choice lists; best structure to organise data and data files
• data checking and verifying - automated and/or manual– e.g. double entry; check for out-of-range values; apply random sample
validation; statistical analyses (descriptives, frequencies, means, range, clustering) to detect errors or find anomalous values; verify data completeness
5
Data formats
• choice of software format for digital data:– planned data analyses– software availability– hardware used– discipline specific standards and customs
• digital data software dependent
• digital data endangered by obsolescence of software/hardware
• best formats for long-term preservation - standard formats, interchangeable formats, open formats
– e.g. tab-delimited; comma-delimited (CSV); ASCII; OpenDocument format; SPSS portable; XML
6
Data format conversions
• convert data for preservation or back-up, e.g. export, save as• beware of conversion errors:
– loss of internal metadata> e.g. convert MS Access to tab-delimited tables
– loss of editing, formatting, formulae> e.g. convert MS Word to RTF
– truncation or loss of data > e.g. string variables lost in SPSS – STATA conversion
• check for errors and changes after conversion
Example 1: MS Excel to tab-delimitedExample 2: Word to XMLExample 3: Proprietary audio file (DVF) to WAV
7
MS Excel format
Tab–delimited text format
8
Version control• keep track of different copies or versions of data files
• which methods:› single site vs. across locations› single vs. multiple users› different versions to be stored vs. files to be synchronised
• single user of data files:› file naming – unique file names with date or version number (avoid spaces!)
e.g. FoodInterview_1_draft; FoodInterview_1_final; HealthTests_06-04-2008; BGHSurveyProcedures_00_04
› version control table or file history within or alongside data file› version control facility within software, e.g. MS Windows software
• multiple users of data files› same as above› control rights to file editing: read/write permissions, e.g. Windows Explorer› versioning/file sharing software: check files out/in, e.g. SVN, VSS, Google Docs, Amazon S3› manual merging of multiple entries/edits
• synchronise files, e.g. MS SyncToy software
9
Authenticity of data
• master files• assign responsibility for master files• record changes to master files
10
Data storage
• digital storage media unreliable• file formats and physical storage media ultimately become obsolete • optical (CD, DVD) and magnetic media (hard drive, tapes) vulnerable and
subject to physical degradation
Best practice:• use data formats with long-term readability• storage strategy with at least two different forms of storage• copy/migrate data files to new media between two and five years after first
created • check data integrity of stored data files at regular intervals (checksum)• know your back-up strategy: institutional/personal; network server/PC/laptop• maintain original copy, external local copy and external remote copy• test file recovery• Data Protection Act and data back-up – may require minimal data copies for
personal data; secure storage
11
Example: data storage and preservation at UKDA
preservation copy (UKDA) shadow copy (UKDA) dissemination copy to reduce
load on main system near-site online copy (on
campus) off-site online copy tape-based offline copy
(UKDA)
Multi-copy, multi-storage media and multi version resilience:
scheduled nightly
robotic 3-
monthly
12
Good data management practice
• plan data management early• assign roles and responsibilities• design data management according to needs and
purpose of research • data management throughout research
13
Resources
• ESDS (2008). Guide to good practice: micro data handling and security. http://www.esds.ac.uk/news/publications/microDataHandlingandSecurity.pdf
• Finch, L. & Webster, J. (2008). Caring for CDs and DVDs. NPO Preservation Guidance. Preservation in Practice Series. London, National Preservation Office. Available at http://www.bl.uk/npo/pdf/cd.pdf
• UK Data Archive (2009). Manage and Share Data. http://www.data-archive.ac.uk/sharing/
See: http://www.data-archive.ac.uk/sharing/furtherstorage.asp