+ All Categories
Home > Documents > Document Title Web viewIf there are ethical issues which may cause difficulties in data sharing,...

Document Title Web viewIf there are ethical issues which may cause difficulties in data sharing,...

Date post: 30-Mar-2018
Category:
Upload: dinhkhuong
View: 217 times
Download: 2 times
Share this document with a friend
45
Preserving your research data for future use 12 December 2012 Staff and Departmental Development Unit Leeds Research Data Management Pilot (RoaDMaP)
Transcript

Preserving your research data for future use12 December 2012

This work is licensed under a Creative Commons Attribution 3.0 Unported License bar pages 21-27.

Staff and Departmental Development UnitLeeds Research Data Management Pilot (RoaDMaP)

1 AcknowledgementsThis training course was developed as part of the University of Leeds, Leeds Research Data Management Pilot (RoaDMaP) project. The RoaDMaP project is funded through JISC’s Digital infrastructure: Research management programme within strand Research Data Management Infrastructure Projects.

The course has been developed in collaboration with Kerry Miller and Alex Ball of the Digital Curation Centre. Indeed much of the course material has been informed by the work of the Digital Curation Centre, www.dcc.ac.uk.

The start point for the course were courses developed by Jez Cope and Cathy Pink, Research360 project, University of Bath entitled Managing Your Research Data, June 2011, http://blogs.bath.ac.uk/research360/2012/07/research-data-management-training-take-2/ (accessed 27/11/12), and Ben Taylorson, University of Durham entitled Managing your research data, April 2011, www.dur.ac.uk/library/research/training/archive/20112012/ (accessed 27/11/12).

Staff at Leeds who have contributed to the course development include Graham Blyth, Brenda Phillips, Rachel Proudfoot, Angela Newton and Dan Pullinger.

2 TutorsDr Jim Baxter Senior Staff Development Officer (Research and Knowledge Transfer), SDDU.

Dr Graham Blyth, Research IT Manager, Faculty of Engineering.

Monica Duke, Institutional Support Officer, Digital Curation Centre, Bath.

3 Course StructureThe programme for the course is as follows.

Introduction 5 mins

Why research data management? 10 mins

The lifecycle of research data 15 mins

What is research data 20 mins

Data management planning 20 mins

What makes a good research data management plan? 20 mins

Approaches to preserving and managing research data 15 mins

The beginnings of your research data management planning 10 mins

Summary 5 mins

Feedback (over lunch) 60 mins

1

4 AimThe aim of the course is to raise awareness of the challenges of preserving and managing research data and approaches to addressing these challenges.

By the end of the session participants will be better able to:

Describe the forms research data takes and the role of contextual documentation and metadata in enabling data reuse

Describe how managing research data effectively will improve your research, save you time, decrease the risks of data loss and increase your professional impact and identify tools to help

Describe University of Leeds and research funder data management expectations

Identify sources of information and guidance on managing research data effectively, including additional training courses

2

5 Why Manage Research Data?What represents your research for you?

What represents your research for others?

What is the value of your research data to you?

What is the value of your research data to others?

How is research validated?

3

5.1 Aircraft Jet Engine Data

It can take 4 years to develop a new jet engine. During this time a plethora of data is created and used. This includes:

Specification and requirements data typically in textual format provided by the airframe manufacturer and passed on by the engine manufacturer to its subcontractors in such things as tender documents. These documents themselves will make reference to Civil Aviation Authority (CAA) requirements for certification of an engine.

A work break down structure where people and teams are assigned to design, develop and test an engine. Typically some form of tabulated data.

Physical data representing the shape of the parts that make up the engine in the form of computer aided design (CAD) files and drawings.

Functional and systems data in analytical, numerical and empirical form from such things as stress analysis, computer fluid dynamic (CFD) analysis and test results on prototype engines. The data is likely to be the form of hand written calculations, CFD grids in the form used by analysis software, spreadsheets of test results. The specification, requirements, functional and systems data can make up 70% of a product’s data.

Bill of Materials (BoM) a list of every component and assembly in the engine including the components and sub-assemblies that make up an assembly.

Manufacturing data about processes, tolerances, computer numerically controlled (CNC) machine files.

Configuration data providing information about the makeup of each specific engine, likely to be similar to the BoM but used to keep a record of the engine as it is used and serviced with parts being repaired, replaced and upgraded.

The business processed by which the engine is designed, developed, manufactured, tested, operated and maintained. This would be typically, captured as in an organisations quality systems.

The above data will need finding on occasion. The start point for developing a new engine is typically an existing design. The development will see changes to the specification and design resulting in iterations of the data. During the design and development of an engine there will be many iterations to the data. Each iteration is likely to require the data from the previous iteration to be found. The engine certification will require data to be found and provided to the Civil Aviation Authority. Should there be an engine failure in service the engine manufacturers need a data trail to be in place to enable them to investigate the cause of the failure. This again requires data to be found.

The proceeding description of the development of jet engines for aeroplanes is based on knowledge of the process developed during the IMI: Control and Access of Product Data Through Product Structures (CAPS) project funded by the Engineering and Physical Sciences Research Council (EPSRC) grant number GR/K96953/01. A more detailed explanation of the challenges in capturing product data is available in Trott et al 1999.

Another larger project investigating the challenges of managing engineering information was the Immortal Information and Through-Life Knowledge Management (KIM): Strategies and Tools for the Emerging Product-Service Paradigm EPSRC-funded Grand Challenge project led by the University of Bath, www.ukoln.ac.uk/projects/grand-challenge (accessed 10/12/12).

4

5.2 Reusing data

Do you still understand your older work?

Is the file structure / naming understandable to others?

Which data has been kept?

Which data was discarded?

How much was planned, how much was circumstance?

5.3 Reasons to manage your data

Extrinsic – requirements of those outside your research group including the funders of research and the wider community.

Responsible conduct of research - Effective management of research data is key to good research [RCUK Policy and Code of Conduct on the Governance of Good Research Conduct].

Funding body grant requirements – Indeed the management of research data is sufficiently important that some research funders have requirements for how it is managed. The Digital Curation Centre provide an Overview of funders' data policies, www.dcc.ac.uk/resources/policy-and-legal/overview-funders-data-policies (accessed 26/11/12).

Research integrity and replication

Intrinsic – the needs of you and the colleagues with which you work now and in the future

Increase research efficiency

Save time and resources

Enhance data security

Prevent duplication of effort by enabling others to use your data

There is also evidence to suggest that making data available increases the citation rate of associated papers (Piowowar et all 2007 shows 69% citation advantage, Henneken & Accomazzi 2011 shows 20% citation advantage). Furthermore, citation of the data in its own right is an emerging scholarly trend.

5

6 A Lifecycle for Research DataA number of lifecycles for research data have been developed. These include the following:

Digital Curation Centre, Curation Lifecycle Model, www.dcc.ac.uk/resources/curation-lifecycle-model (accessed 26/11/12).

MANTRA’s www.docs.is.ed.ac.uk/docs/data-library/MANTRA_poster.pdf (accessed 26/11/12).

Research Data Archive, Research Data Lifecycle, http://data-archive.ac.uk/create-manage/life-cycle (accessed 26/11/12).

UKOLN, Infrastructure for Integration in Structural Sciences (I2S2) project, I2S2 Idealised Scientific Research Activity Lifecycle Model, www.ukoln.ac.uk/projects/I2S2/documents/I2S2-ResearchActivityLifecycleModel-110407.pdf (accessed 26/11/12).

Each of these models provides a context for managing research data and covers aspects of this such as planning the management of research data, preserving data and archiving data. The purpose of the following lifecycle for research data is to raise awareness of research data management. The preceding models are likely to be more useful once you are aware of the broad challenges of research data management. The following lifecycle has been developed based on the above lifecycles.

Figure 1. A lifecycle for research data.

Figure 1 shows a lifecycle for research data. Broadly, research data is collected, recorded, processed and research results are published.

6

Collect

data

Record data

Active useArchivePreserveCurate

Process

data

Publish

research

Publish

data

Research method protocols form of meta data research data management plan

The term record is used to cover saving and storing research data. The notion behind the use of record is that of making a written record. The processing of research data includes analysis and can be done by human or machine. It includes processing numbers in tables to produce charts and graphs as well as interpretation of the charts and graphs to find new knowledge and understanding.

Alongside publishing research (in a journal paper say) is the publishing of the research data that is used as a basis for the research published in a journal paper.

Recorded data can be used in different time frames: immediately or in the near future where the data is worked on actively; in the medium term as part of writing research up for publication; in the long term research data might be reused or repurposed. The data might need archiving, preserving and curating. These terms are explored more below.

Archive – the Oxford English Dictionary (online version accessed 24/10/12) definition refers to: to place or store in an archive; in Computing, to transfer to a store containing infrequently used files, or to a lower level in the hierarchy of memories, esp. from disc to tape. From a research data management perspective this is about identifying and storing research objects for short medium and long term use. Some data should be archived on day zero such as primary data that will not change and will be difficult to recreate. For example CAT and MRI scans of bodies. The preceding is considering archive as a verb it can also be a noun which is a place for storing. A repository is an archive which supports reuse.

Preserve – the Oxford English Dictionary (online version accessed 24/10/12) definition refers to: to keep in its original or existing state; to make lasting. The notion here is ensuring data can be read in the future. This requires the use of file formats that it will be possible to read in the future and preventing the corruption or loss of data. Alternatively, files that include sufficient information to allow the file to be decoded means the files become independent of software and hardware.

Curate – the Oxford English Dictionary (online version accessed 24/10/12) definition refers to: to look after and preserve. This is about continued preservation. It is about checking files for corruption and repairing them. This can mean keeping multiple copies of files and checking for differences between them periodically. Broadly, checking what went in is what comes out.

Research method is typically associate with the way research data is collected and processed. The research method can also cover the recording of research data and how the research data is published. In this way the whole research data lifecycle is covered by the research method. Furthermore, the research method also forms part of the context for the research data and can itself be published.

There are benefits as discussed above in publishing research data. There is also legislation such as the Data Protection Act and commercial confidentiality that requires care is taken when publishing research data. Broadly, data which relates to individuals should be kept confidential. Some research may be commercially valuable and covered by confidentiality agreements. Research data that compromises such areas of confidentiality should not be published.

7

7 What is Research Data?

7.1 Research data

What are data?

The lowest level of abstraction from which information and knowledge are derived

Research data are collected, observed or created, for the purposes of analysis to produce and validate original research results

Both analogue and digital materials are data

Digital data can be:

o created in a digital form ("born digital")

o converted to a digital form (digitised)

Data types

Data Types Value Example

Observational data captured around the time of the event

Usually irreplaceable Sensor readings, telemetry, neuro- images, survey results

Experimental data from lab equipment

Often reproducible but can be expensive

Gene sequence, chromatograms, toroid magnetic field readings

Simulation data generated from test models

Model and metadata more important than output data

Large modules can take a lot of computer time to reproduce

Climate models, economic (inputs) models.

Derived or compiled data Reproducible (but very expensive)

Text and data mining, compiled databases, 3D models

MANTRA provide an explanation of research data in their online training material, Research Data Explained, http://datalib.edina.ac.uk/mantra/researchdataexplained.html, (accessed 26/11/12).

8

7.2 Metadata

What does ix+viii=xi Mean?

What is the date 110612?

Contextual information for data is called metadata

literally data about data

Data repositories & archives require some generic metadata, e.g.

author, title, publication date

For data to be useful, it will also need subject-specific metadata e.g.

reagent names, experimental conditions, population demographic

Record contextual information in a text file (such as a ‘read me’ file) in the same directory as the data e.g.

codes for categorical survey responses ‘999 indicates a dummy value in the data’

The UK Data Archive provides a more detailed explanation of metadata in Documenting your data, www.data-archive.ac.uk/create-manage/document, (accessed 26/11/12).

A standard for metadata is Dublin Core http://dublincore.org/documents/dces/ (accessed 08/11/12).

The MIT Libraries also provide guidance on meta data on their Data Management and Publishing, Documentation and Metadata web page, http://libraries.mit.edu/guides/subjects/data-management/metadata.html, (accessed 26/11/12).

9

7.3 What research data do you have?

What sort of research data do you have?

How do you record/save/keep your research data?

How do you process/analyse your research data?

What meta data do you have? (Would it be enough for someone else to use your data?)

10

8 What Should a Research Data Management Plan Cover?Digital Curation Centre Checklist for a Data Management Plan (Jones 2011)

Be clear about who is responsible for what.

9 What makes a good research data management plan?This section has been provided by the Digital Curation Service

Exercise: assessing a Data Management Plan

Researchers often assume that those reviewing grant proposals will know what we are inferring

with vague statements and that the acronyms and terms we use daily in our specific research

areas are understood by all. There is no better way to learn what not to do when submitting a new

bid than to take part in the evaluation of others' proposals. Similarly, a good way to get a grasp of

what you might want to include in your future data management plans is to look over one and

assess its merit as a reviewer might.

Breaking into groups of 4-5, look over the sample data management plans at:

- Annex A – Data Management Plan A

- Annex B – Data Management Plan B

A copy of the ESDS Data Management Plan Guidance for Peer Reviewers has been provided for

reference at Annex C.

Assuming the role of ESRC bid reviewers, each group should consider whether the sample data

management plans provide sufficient detail to enable the plans to be effectively assessed as part

of a bid. Would you need more information in some areas to be able to determine how well the PI

will be managing their data?

Groups will have 15 minutes to review the sample data management plans and related documents.

Each group will be asked to report back on areas where additional information would be needed by

reviewers to make an informed assessment.

11

10 Approaches to Preserving and Managing Research DataLevels of research data management

At a basic level research data needs documenting. Refinement is added by being structured about how the data is stored for example the use of folders for computer files along with readme files that explain the contents of a folder. More refinement can be found through using tools specifically for managing research data and its metadata.

A check list on planning data management is available from the following.

UK Data Archive, Create & Manage Data, Planning For Sharing, Data Management Checklist, www.data-archive.ac.uk/create-manage/planning-for-sharing/data-management-checklist, (accessed 27/11/12).

MIT Libraries, Data Management and Publishing, Data Planning Checklist, http://libraries.mit.edu/guides/subjects/data-management/checklist.html, (accessed 26/11/12).

Data format

The research data resulting from a project or study will have a format. This format will be suggested by the research method. The format may be dictated by the equipment being used to capture the data. Capturing research data in a consistent format will enable data to be more readily analysed. Where the data is being captured by multiple people in multiple locations agreeing a format for the data at the beginning of the work will aid analysis of the whole data set at a later date.

Table 1 shows an example of the format for some data taken from sensors measuring temperature and pressure. Table 2 shows the beginnings of a format for data collected through questionnaires.

Date of reading Time of reading Person taking reading

Temperature (OC)

Pressure (KPa) …

Table 1. Example of a format for readings taken from sensors.

Date of interview

Person interviewing

Name of interviewee

File containing interview notes

Ethnicity …

Table 2. Example of a format for recording interviews.

Guidance on the format of data is available from the following.

12

UK Data Archive, Create & Manage Data, Documenting Your Data, Data Level, www.data-archive.ac.uk/create-manage/document/data-level, (accessed 27/11/12).

Broader advice on documenting data, linking the views of metadata and data is available from

UK Data Archive, Create & Manage Data, Documenting Your Data, www.data-archive.ac.uk/create-manage/document, (accessed 27/11/12).

File formats

When storing data in files you should consider the file format. Formats that can only be accessed by specialist software are only of use as long as the software is available. On the other hand ASCII text files (.txt) and comma-separated values (.csv) formats can be read by many pieces of software.

A project might recommend a set of file types to be used such as, Timescapes: http://www.timescapes.leeds.ac.uk/archive/ which has as standard formats XLS; CSVs, JPGs, PDFs, PPTs, RTFs (all doc and docx files are saved as rich text format files/ ‘RTFs’ as this better supports longevity).

Guidance on file formats is available from the following:

UK Data Archive, Create & Manage Data, Formatting Your Data, File Formats & Software, www.data-archive.ac.uk/create-manage/format/formats, (accessed 26/11/12).

MIT Libraries, Data Management and Publishing, File Formats for Long-Term Access, http://libraries.mit.edu/guides/subjects/data-management/formats.html, (accessed 26/11/12).

Organising files

Structuring and naming files can aid in finding your data now and in the future.

The following is an example approach used by Timescapes for file naming and structuring:

1. Keep file names simple. Use a combination of abbreviations as pointers to what the file is, do not use any spaces, underscores or symbols and I try not to make the file names any longer than 20 characters.

2. An example file name structure: TsJsmith2WrkIntvBp.docx (each new part of the structure starts with a capital letter). The file names reflect elements of the research. ie

TS (Timescapes as the name of the project)

Jsmith (the case of John Smith or an anonymised name)

2 (2nd wave/phase activity but could be the date/month)

Wrk (Work as being the Main theme/keyword that this particular John Smith file is referring to)

Intv (The research activity eg Interview, or Pic for photo, TL Timeline, RMP relationship map / or any variation on this approach as suits your data)

Fw (persons initials denoting what stage the file/data is at)

3. Compile a table of the abbreviations used in forming file names.

13

4. This type of file name also suggests a folder structure. In this example the John Smith folder will hold all documentation relating to him, organised firstly into time/wave/phase periods and then within these time periods, the files covering the range of activities relating to Jsmith will be housed. Similarly even when a file is located outside of a bundled structure the file name will tell you the bulk of what you need to know.

This above approach is an example. Researchers should design their own file naming and structuring scheme in such a way that it allows easy organisation and finding of data.

Guidance on file naming is available from the following.

Digital Curation Centre, Standard Naming Conventions for Electronic Records, www.dcc.ac.uk/resources/external/standard-naming-conventions-electronic-records-0 (accessed 08/11/12).

University Edinburgh, Records Management Section, Standard Naming Conventions For Electronic Records: The Rules, www.recordsmanagement.ed.ac.uk/InfoStaff/RMstaff/RMprojects/PP/FileNameRules/Rules.htm (accessed 08/11/12).

UK Data Archive, Create & Manage Data, Formatting Your Data, Organising Data, www.data-archive.ac.uk/create-manage/format/organising-data, (accessed 26/11/12).

MIT Libraries, Data Management and Publishing, Organizing Your Files http://libraries.mit.edu/guides/subjects/data-management/organizing.html, (accessed 26/11/12).

Data storage

You should consider what storage device is most appropriate to your needs. You should identify how much data do you have to store. Broadly, electronic storage is achieved using magnetic or optical devices. Both can degrade for example CDs can get scratched. The UK Data Archive recommend that data is copied to new media between 2 and 5 years after they were created. You should also check the integrity of the data.

Memory sticks should be used with caution as they are easily corruptible and lost. Indeed, anecdotally the Library and ISS have boxes of memory sticks which have been left in computers and elsewhere. For data protection reasons they are not allowed to look at the content of memory sticks. Could you identify your memory stick in a box of 100 others?

Guidance on storing data is available from the following.

UK Data Archive, Create & Manage Data, Storing Your Data, Storing Data, www.data-archive.ac.uk/create-manage/storage/store-data, (accessed 26/11/12).

Backup

You should also consider what the risk of losing your data is. Risk is a function of likelihood and consequence of something happening.

What is the risk of losing your data? The JISC funded Sound Data Management Training (SoDaMaT) project has identified research http://code.soundsoftware.ac.uk/attachments/605/JISC-Training.pdf (accessed 09/11/12) which suggests that: within education and research 1 in 10 laptops are lost within 3 years, http://tinyurl.com/8c9m4bn, a typical length for a research project; 1 in 5 laptops have significant problems during their lifetime, http://tinyurl.com/876qza5; as part of their repairs procedures Google replace 1 in 5 of their hard disk drives within 4 years, http://tinyurl.com/octz6b.

14

What is the consequence of losing your data? It might compromise a whole research project and reduce the likelihood of publishing papers.

So it is suggested that the risk of research data without mitigating against the above is high!

To mitigate against this risk you should back up your data. You should think about how frequently you back up and where you keep the back up. If you keep your backup and computer in the same place and there is a fire then you are likely to lose your data.

Guidance on backing up is available from the following.

University of Leeds, Policy on Safeguarding Data, http://iss.leeds.ac.uk/info/362/policies/782/policy_on_safeguarding_data (accessed 08/11/12), provides guidance on backing up data.

UK Data Archive, Create & Manage Data, Storing Your Data, Backing-up, www.data-archive.ac.uk/create-manage/storage/back-up, (accessed 26/11/12).

University of Leeds Policy

Policy on safeguarding data http://iss.leeds.ac.uk/info/362/policies/782/policy_on_safeguarding_data (accessed 08/11/12).

Policy on Research Data Management http://library.leeds.ac.uk/research-data-policies#activate-tab1_university_research_data_policy (accessed 08/11/12). Also included in Annex D.

Data Protection Code of Practice www.leeds.ac.uk/secretariat/data_protection_code_of_practice.html (accessed 08/11/12).

What is your experience of finding files?

15

11 Planning your data managementThe research data you have , how it is recorded and its meta data were identified above in Section 7.3. The following space should be used to consider your broader research data management needs. The questions are intended to help you identify the current state of how you manage your research data and to get you thinking about how you will improve the way you manage your research data. These questions are based on elements in the Digital Curation Centre Checklist for a Data Management Plan. The questions are not a complete set.

1. What funding body requirements should you fulfil?

2. How do you intend to go about developing your (individual and research group) data management plan?

3. What ethical and privacy issues do you have and how will they be addressed?

4. Who is interested in your research or has a stake in it and why?

5. Who would be interested in research data?

6. How much data requires short term storage?

7. What security is required?

8. Who is responsible for creating the research data?

9. Who is responsible for storing the research data?

10. Who is responsible for how you or your research group manage your research data (responsible for the research data management plan)?

11. How will you decide which data is preserved?

12. Where will the data you identify for preservation be stored?

13. What is the budget for data storage and preservation?

14. What is your next step to improve your research data management?

16

12 Summary“Publicly funded research data are a public good, produced in the public interest, which should be made openly available with as few restrictions as possible in a timely and responsible manner that does not harm intellectual property.” This is the first principle in Research Councils Common Principles on Data Policy.

Managing research data is part of good research practice.

The data allows you to justify your research findings. Enables you to more easily find and reuse your research data Managed data can be shared with others

A research data management plan helps in achieving this.

Be clear about who is responsible for research data and its management.

17

13 ReferencesDigital Curation Centre, www.dcc.ac.uk, (accessed 27/11/12).

Digital Curation Centre, Overview of funders' data policies, www.dcc.ac.uk/resources/policy-and-legal/overview-funders-data-policies (accessed 26/11/12).

Digital Curation Centre, Resources for digital curators, www.dcc.ac.uk/resources, (accessed 27/11/12)

Henneken, E.A. & Accomazzi, A. Linking to Data - Effect on Citation Rates in Astronomy. Astronomical Data Analysis Software and Systems XXI: p.763-766 http://arxiv.org/abs/1111.3618, 2011.

Jones, S. ‘How to Develop a Data Management and Sharing Plan’. DCC How-to Guides. Edinburgh: Digital Curation Centre, 2011. Available online: www.dcc.ac.uk/resources/how-guides, (accessed 26/11/12).

Massachusetts Institute of Technology (MIT) Libraries, Data Management and Publishing, http://libraries.mit.edu/guides/subjects/data-management/index.html (accessed 26/11/12).

Piwowar H.A., Day R.S., Fridsma D.B. Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308. doi:10.1371/journal.pone.0000308, 2007.

Research Councils UK, RCUK Common Principles on Data Policy, www.rcuk.ac.uk/research/Pages/DataPolicy.aspx (accessed 10/12/12).

Research Councils UK, RCUK Policy and Code of Conduct on the Governance of Good Research Conduct, Integrity, Clarity and Good Management, October 2011, www.rcuk.ac.uk/documents/reviews/grc/goodresearchconductcode.pdf (accessed 26/11/12).

S. T. Trott, J. E. Baxter, A. McKay, B. W. Henson, and A. de Pennington. Supporting product introduction processes through product structures. In Design Engineering Technical Conferences 1999. ASME, DETC-99/DTM-8745, 12-16 September 1999, ISBN 0-7918-1967-1.

Timescapes, an ESRC Qualitative Longitudinal Initiative, http://www.timescapes.leeds.ac.uk/ (accessed 26/11/12).

UK Data Archive, Create and Manage Data, www.data-archive.ac.uk/create-manage, (accessed 26/11/12).

University of Leeds, Policy on safeguarding data http://iss.leeds.ac.uk/info/362/policies/782/policy_on_safeguarding_data (accessed 08/11/12).

University of Leeds, Policy on Research Data Management http://library.leeds.ac.uk/research-data-policies#activate-tab1_university_research_data_policy (accessed 08/11/12).

University of Leeds, Data Protection Code of Practice www.leeds.ac.uk/secretariat/data_protection_code_of_practice.html (accessed 08/11/12).

Researchers should also make sure they know of the data management requirements of their research funders.

18

[Annex A - Data Management Plan A]

Socio-technical Systems and Call Centres: a Case Study InvestigationThis project is not yet funded.Funding body: ESRCLead organisation: University of XOther organisations: Financial call centre A; Financial call centre: BProject dates: 02 Jan 2012 to 30 Apr 2012Budget: £25,000.00

1 Existing data sources1.1 An explanation of the existing data sources that will be used by the research project (with

references).

2.2.2 What existing datasets could you use or build upon? N/A

2 Gaps between the currently available and required data2.1 An analysis of the gaps identified between the currently available and required data for the

research.2.3.1 Why do you need to capture/create new data?

There are currently no data available that facilitate my research objectives.

2.4.1 What is the relationship between the new dataset(s) and existing data?

N/A

3 Information on the data that will be produced by the research project3.1 Data volume and data type, e.g. qualitative or quantitative data

2.1 Give a short description of the data being generated or reused in this research

35 semi-structured interviews will be carried out with financial call centre managers, employees and customer service representatives (CSR's). The interviews will be audio-taped.

3.2 Data quality, formats, standards documentation and metadata2.3.3 Which file formats will you use, and why?

I will make use of standard formats for my dataset and audio files.

2.3.4 What criteria will you use for Quality Assurance/Management?

Interview participants will have the chance to sign-off on the audio transcripts to ensure that the information is accurate.

2.5.1 Are the datasets which you will becapturing/creating self-explanatory, or understandable in isolation?

Yes

2.5.2 If you answered No to DCC 2.5.1, what contextual details are needed to make the data you capture

or collect meaningful?

N/A

3.3 Methodologies for data collection2.3.2 Describe the process by which youwill capture/create new data

The interviews will be carried out face to face. Transcripts will be created following the interview.

4 Quality assurance and back-up procedures4.1 Planned quality assurance and back-up procedures (security/storage)

5.2.1 How will you back-up the data during the project's lifetime?

The data will be stored on the PI's laptop during the life of the project. Once the project has been completed, the

19

data will be deposited with ESDS.5.3.1 How will you manage access restrictions and data security during the project's lifetime?

Only the PI will have access to the data during the project.

5.3.3 Give details of any other security issues. N/A

5 Management and archiving of collected data5.1 Plans for management and archiving of collected data

6.1 What is the long-term strategy for maintaining, curating and archiving the data?

The data will be deposited with the ESDS upon completion of the project.

6 Difficulties in data sharing6.1 Expected difficulties in data sharing, along with causes and possible measures to overcome these

difficulties. (You may wish to include explicit mention of consent, confidentiality, anonymisation and other ethical considerations.)3.1.1 Are there ethical and privacy issues that may prohibit sharing some or all of the dataset(s)?

Yes

3.1.2 If you answered Yes to DCC 3.1.1 How will these be resolved?

Transcripts will only be seen by the participants involved in the interview and the PI. The transcripts for deposit to ESDS will be anonymised..

7 Copyright and intellectual property7.1 Copyright and intellectual property ownership of the data

3.2.1 Will the dataset(s) be covered by copyright or the Database Right? If so give details in DCC 3.2.2, below.

Yes

3.2.2 If you answered Yes to DCC 3.2.1 Who owns the copyright and other Intellectual Property?

The PI owns the copyright for this data.

8 Responsibilities for data management and curation8.1 Responsibilities for data management and curation within research teams at all participating

institutions7.1 Outline the staff/organisational roles and responsibilities for implementing this data management plan.

The PI will be responsible for all aspects of datamanagement during the life of the project. ESDS will be responsible for long-term management and curation of this data.

3.1.2 If you answered Yes to DCC 3.1.1 How will these be resolved?

Transcripts will only be seen by the participants involved in the interview and the PI. The transcripts for deposit to ESDS will be anonymised..

Signature _______________________________Date ______________________________

Print name ______________________________Role/institution ______________________

Signature _______________________________Date ______________________________

Print name ______________________________Role/institution ______________________

Signature _______________________________Date _______________________________

20

Print name ______________________________Role/institution _______________________

[Annex B - Data Management Plan B]

ESRC-DFID Example Data Management Planhttp://www.esrc.ac.uk/_images/Example-Data-Management-Plan_tcm8-20657.pdf

Existing data

The research objectives require qualitative data that are not available from other sources. Some data exist that can be used to situate and triangulate the findings of the proposed research (eg, surveys of poverty impacts; opinion polls), and which will supplement data collected as part of the proposed research. However, qualitative and attitudinal data are generally rare or of insufficiently high quality to address the research questions.

The research objectives also require quantitative analysis of public data. Some quantitative data are available, but they are insufficiently detailed. In their current form, they would not permit as full a comparison across the cases as is desirable.

Information on data

For these reasons, the research project involves primary data collection: 1) public data; 2) semi-structured interviews; and 3) focus group discussions with people identified through profiling techniques: 1. Public data

Where possible, we will use online and/or electronic archives. This will involve extracting and processing quantitative data, including participants, objectives and outcomes. Key search terms and their translation into the relevant languages, inclusion and exclusion criteria for items, variable codes and metadata will be refined and agreed in the inception phase of the project.

Preliminary searches indicate that a sufficiently detailed dataset can be generated. The junior researchers will log their progress, documenting potentially contentious categorising decisions, difficulties faced in categorising items, and qualitative insights which do not fit the spreadsheet format. Data will be inputted and stored in a widely available spreadsheet format (eg Excel or SSPS), to ensure accessibility to Southern researchers. 2. Semi-structured interviews with individuals

The team anticipates undertaking 25-40 semi-structured interviews in each country from a sample frame to be developed in Phase 2. Data will be collected and stored using digital audio recording (eg MP3) where interviewees permit. In case they do not, interviews will be undertaken in pairs to enable detailed note-taking. Interview notes will be typed up according to agreed formats and standards. Where interviews are taped and in English, the UK research assistant will assist with transcription. 3. Focus group discussions matched to profiles

The sample frame for the focus group participants will be derived from public data. Numbers of focus groups will depend on geographical and other variations in patterns; how quickly a robust pattern of findings emerges; and the scope for identifying and convening the appropriate groups. Focus groups will involve two researchers, and be conducted in the vernacular. Whether recorded or not, the event will be transcribed or documented using agreed formats and standards for handling the issue of multiple voices, interruptions, labelling of participatory and visual activities, and so on.

All transcripts will be in Microsoft Word. All the researchers (except the UK research assistant) will be reasonably fluent in both English and the main language in which interviews and focus groups will be

21

conducted, so that transcriptions will be translated into English only where the researcher is fluent in both languages and better able to transcribe in English, or to enable analysis of particular sections of the text. This will avoid unnecessary cost.

During the inception Phase 2, the metadata, procedures and file formats for note-taking, recording, transcribing, storing visual data from participatory techniques, and anonymising semi-structured interview and focus group discussion data will be developed and agreed. Focus group and interview transcripts will be coded in NVivo or a qualitative software suited to the different languages; the most appropriate software for a comparative multi-language study has not yet been identified.

Quality assurance

The PI will be responsible for overall quality assurance, with lead country researchers and the UK research assistant undertaking specific activities to ensure quality control. Detailed protocols for extracting data from secondary sources will be developed, piloted, refined and agreed in Phase 2.

Quality will be assured through routine monitoring by the lead country researcher, and periodic cross-checks against the protocols by the UK-based research assistant. While interview and focus group protocols are being developed in Phase 2, standards and systems for note-taking, recording (if possible), transcribing and storing visual data from participatory techniques such as drawings, photographs and video, use of metadata, systems for downloading and storing SMS data (a potential follow-up research tool) will also be defined.

Focus groups and interviews will always involve two researchers. Quality control for the qualitative data collection will be assured through refresher focus group discussion training during research design workshops and to junior researchers, where appropriate. Either the UK Institution or lead country researcher will check through each transcript for consistency with agreed standards. Where translations are undertaken, quality will be assured by one other researcher fluent in that language checking against the original recording or notes.

Backup and security

Our data will need to be backed up regularly; because of likely problems with viruses and hardware in developing countries, this will include regular email sharing with the UK research assistant, so that up-to-date versions are stored on the UK Institutions server. Qualitative data will be backed up and secured by the lead country researcher on a regular basis and metadata will include clear labelling of versions and dates. There are some potential sensitivities around some of the data being collected, so the project will establish a system for protecting data while it is being processed, including use of passwords and safe back-up hardware.

Ethical issues

A letter explaining the purpose, approach and dissemination strategy (including plans to share data) of the research, and an accompanying consent form (including to share data) will be prepared and translated into the relevant languages.

A clear verbal explanation will also be provided to each interviewee and focus group participant. Commitments to ensure confidentiality will be maintained by ensuring recordings are not shared; that transcripts are anonymised and details that can be used to identify participants are removed from transcripts or concealed in write-ups.

As the highly-focused nature of the research means that many participants may be easily identifiable despite efforts to ensure anonymity or confidentiality, where there is such a risk, participants will be shown sections of transcript and/or report text to ensure they are satisfied that no unnecessary risks are being taken with their interview data.

Interviews with elite policymakers will not guarantee confidentiality unless this is requested, as interviewees will be expected to speak in their official capacities or institutional roles. However, as is often the case, interviewees may be more comfortable if some sections of their interview are not recorded or made public. In such circumstances, recording will be paused or sections of text will be expunged from shared transcripts, and an indication made that this is the case.

Expected difficulties in data sharing

22

Not all of the transcripts will be translated into English (see above), which will limit the accessibility of the data.

Copyright/Intellectual Property Right

The institutional partners will jointly own the data generated. Online and archival sources will be cited and clearly acknowledged in the database and research outputs. Permission will be sought from secondary sources to share the findings of the research on public websites.

Responsibilities

The PI will direct the data management process overall, with the UK research assistant responsible for ensuring metadata production, day-to-day cross-checks, back-up and other quality control activities are maintained. The lead country researchers will be responsible for routine supervision of the dataset development.

Data extraction, processing and inputting for the dataset will be undertaken by the in-country junior researchers. The UK Institution, lead country and junior researchers will share responsibilities for collecting and transcribing focus group and interview data, with the UK research assistant supporting as necessary. The PI will be finally responsible for dealing with quality and sharing and archiving of data.

Preparation of data for sharing and archiving

The most appropriate means of sharing the data generated through the project will be online, through institutional websites. The project will have a dedicated space on the UK Institutional website to facilitate this, and all other involved institutions will also be encouraged to host the data on their websites.

23

[Annex C - ESDS Data Management Plan Guidance for Peer Reviewers http://www.esrc.ac.uk/_images/Data-Management-Plan-Guidance-for-peer-reviewers_tcm8-15569.pdf ]

Data Management Plan: Guidance for peer reviewers

What is a data management plan? ..........................................................................................................

1

What do I need to check? .......................................................................................................................

1

Assessment of existing data ....................................................................................................................

2

Information on new data ..........................................................................................................................

2

Quality assurance of data .........................................................................................................................

2

Backup and security of data .....................................................................................................................

2

Expected difficulties in data sharing .......................................................................................................

3

Copyright/intellectual property right ....................................................................................................

3

Responsibilities ...........................................................................................................................................

3

Preparation of data for sharing and archiving ......................................................................................

4

What is a data management plan?

A data management plan should incorporate data management into the research cycle to ensure

that generated data can be made available and re-used to the maximum extent possible at the end

of a grant, in line with the ESRC Research Data Policy

(http://www.esrc.ac.uk/about-esrc/information/data-policy.aspx). A data management plan is

mandatory in all applications planning to generate data, except for applicants applying for

studentships. Most data generated as a result of economic and social research can be successfully

archived and shared. However, some research data are more sensitive than others. It is the grant

holder’s responsibility to consider all issues related to confidentiality, ethics, security and copyright

before initiating the research. Any challenges to data sharing (eg copyright or data confidentiality)

should have been critically considered in a plan, with possible solutions discussed to optimise data

sharing.

What do I need to check?

24

Please assess the quality of the proposed data management plan and comment on whether

appropriate and realistic consideration has been given to data management requirements to

maximise data sharing, and whether the requirements are justified according to the proposed

research. The data management plan is an integral part of the application and should be

considered in the context of information presented in the Case for Support and Justification for

Resources. We have given suggestions below for each of the points included in a data

management plan that may help you to evaluate whether the plan is fit for purpose.

For more guidance please refer to the ESDS data management guides

(http://www.esds.ac.uk/support/datamanguides.asp) and the UK Data Archive’s Managing and

Sharing Data guide (http://www.data-archive.ac.uk/create-manage). Any suggestions for

improvements to data management plans are welcome and will be fed back to applicants. If you do

not feel competent to comment on data management, please select ‘Unable to assess’.

Assessment of existing data

You may want to consider the following questions:

• Is there evidence that secondary sources of data have been considered and evaluated?

• Is there evidence presented that the project is not creating new data when there are

existing resources that could be re-used?

• If existing data are used, have issues such as copyright or IPR of such data been

considered and possible copyright clearance obtained to be able to share data or data

derived thereof?

Information on new data

You may want to consider the following questions:

• Is the information on data to be produced adequate and realistic and according to the

research and methodology proposed in the application?

• Is there evidence that the plan covers all data that is planned to be generated from the

research?

• Is sufficient information given on how data will be collected and in which formats (eg

Open Document Format, tab-delimited, Excel etc) data will be analysed and stored, as

well as an indication of how they will be documented?

Quality assurance of data

You may want to consider the following questions:

25

• Is information given on procedures for quality assurance that will be carried out on the

data collected? (Please refer to the Case for Support for full information on quality

control of the proposed research.) This could include methods for data validation or

standards applied during data collection and data entry, codes of research practice

adhered to, transcription templates used, etc.

• Are no quality assurance procedures mentioned when there is a clear need from the

proposed research that there should be? Please note that quality issues are to be

addressed at the time of data collection, data entry, digitisation or data checking.

Backup and security of data

You may want to consider the following questions:

• Is the data back-up procedure described fit for purpose? eg considering back-up

procedures for all institutions

• Are methods of version control described? (ie making sure that if the information in one

file is altered, the related information in other files is also adopted, as well as keeping a

track on a number of versions and their locations)

Expected difficulties in data sharing

You may want to consider the following questions:

• Have all obstacles to sharing data been considered?

• Have strategies been considered for dealing with these issues? For example by: o

discussing data sharing and re-use with interviewees and gaining specific consent from

participants to share research data

o anonymising data to remove personal and disclosive information

o regulating access to data

If there are ethical issues which may cause difficulties in data sharing, strategies for dealing with

these issues should be discussed in the relevant section in the Je-S form. In assessing this part of

the application you may want to refer to the requirements of the ESRC Framework for Research

Ethics (www.esrc.ac.uk/researchethics). If newly generated data cannot be shared, adequate

justification should be given. It may be a case that parts of the data that are sensitive cannot be

shared, but this should be considered critically and the plan should provide evidence that it has

been assessed from all angles. We regard a waiver of deposit as an exception, and reserve the

right to refuse waivers where there is insufficient evidence that the applicant has fully explored all

strategies to enable data sharing and archiving.

26

Copyright/intellectual property right

You may want to consider the following questions:

• Is copyright of research data (both existing sources of data used or created) agreed or

clarified, especially for collaborative research or if various sources of data are combined?

• Are plans in place for copyright clearance for data sharing (if possible)?

Responsibilities

You may want to consider the following questions:

• Have data management responsibilities been allocated to named individuals?

• Is there evidence that data management will be followed throughout the course of the

project?

• Has consideration been given to the variety of data management tasks that may be

required for the research?

• For collaborative research, are data management responsibilities allocated at each

partner organisation (if needed for the research) or has the coordination of data

management responsibilities across partners been considered?

For further information please refer to the information provided within the Staff Duties section in the

Je-S form and where appropriate in the Justification of Resources.

Preparation of data for sharing and archiving

The following questions may be considered when assessing this section of the plan:

• Are the plans for preparing and documenting data for sharing and archiving with the

Economic and Social Data Service appropriate?

• Is there evidence that data will be well documented during research to provide high-

quality contextual information and/or structured metadata for secondary users? eg

documenting the method of data collection, origin, circumstances, processing and

analysis of data.

Preparation of data for sharing and archiving

The following questions may be considered when assessing this section of the plan:

• Are the plans for preparing and documenting data for sharing and archiving with the

Economic and Social Data Service appropriate?

27

• Is there evidence that data will be well documented during research to provide high-

quality contextual information and/or structured metadata for secondary users? eg

documenting the method of data collection, origin, circumstances, processing and

analysis of data

28

University of Leeds Research Data Management Policy (July 2012)

The management of Research Data reflects our: commitment to research excellence recognition of our duty to our funders appreciation of the value of our data - to us and to others

1. Research data will be managed to agreed standards throughout the research data lifecycle and according to funder requirements.

2. Responsibility for research data management during any research project or programme lies with responsible owners such as Principal Investigators (PIs).

3. The University is responsible for the provision of training, support and advice on research data management

4. A data management plan that explicitly addresses the capture, management, integrity, confidentiality, preservation, sharing and publication of research data must be created for each proposed research project or funding application. Sufficient metadata shall also be created and stored to aid discovery and re-use. Data management plans should take account of and ensure compliance with relevant legislative frameworks which may limit public access to the data (for example, in the areas of data protection, intellectual property and human rights).

5. All research data should be offered and assessed for deposit and preservation in an appropriate University, national or international data service or domain repository, unless specified otherwise in the data management plan.

6. Data should not be deposited with any organisation that does not commit to its access and availability for re-use, unless this is a condition of the project funding or arising from other requirements.

7. At the completion of each research project, the PI should ensure that all relevant research data are made available, subject to meeting appropriate requirements, in the location specified in the data management plan.

8. Research and Innovation Board will be responsible for reviewing and updating the policy.

The University recognises the following benefits of implementing this policy: a. support for the re-use of data

b. benefit future generations

c. improved data integrity, security and access management

d. opportunities for further research collaboration

e. improved research reproducibility and validation

f. further development of research skills

g. the ability to cite data as a publication

h. improved institutional research reputation

i. improved relationship with research funders

29

[Annex D - University of Leeds Research Data Management Policy (July 2012)]


Recommended