+ All Categories
Home > Education > Research Data Management and Librarians

Research Data Management and Librarians

Date post: 22-Nov-2014
Category:
Upload: johann-van-wyk
View: 328 times
Download: 1 times
Share this document with a friend
Description:
This presentation was delivered at the Elsevier Library Connect Seminar on 6 October 2014 in Johannesburg, 7 October 2014 in Durban and 9 October 2014 in Cape Town and gives an overview of the potential role that librarians can play in research data management
Popular Tags:
50
Research Data Management and Librarians Presentation at Elsevier Library Connect Seminar, 6 October 2014, Johannesburg, 7 October 2014, Durban and 9 October 2014, Cape Town By Johann van Wyk (University of Pretoria)
Transcript
Page 1: Research Data Management and Librarians

Research Data Management and LibrariansPresentation at Elsevier Library Connect Seminar, 6 October 2014, Johannesburg, 7 October 2014, Durban and 9 October 2014, Cape Town

By Johann van Wyk (University of Pretoria)

Page 2: Research Data Management and Librarians

Introduction

Internationally research data is increasingly recognised as a vital

resource whose value needs to be preserved for future research. This

places a huge responsibility on Higher Education Institutions to

ensure that their research data is managed in such a manner that

they are protected from substantial reputational, financial and legal

risks in the future. Librarians have a unique skillset to help these

institutions navigate this complex environment. This presentation

will highlight a number of potential roles librarians could play.

Page 3: Research Data Management and Librarians

Research Data Management: A (Brave) Complex New World

DOI

Visualisation

RDMData Prese

rvation

Big Data Link

ed

Dat

a

DMP

Data Policy

Data Repository

Data Journals

Open Data

Data Anonymisation

Copyright License

Data Formats

Messy Complex

Small Data

Various formats Various

devicesVarious Versions

Sensitive Data

Page 4: Research Data Management and Librarians

What is meant by Research Data?

Research data, unlike other types of

information, is collected, observed, created or

generated, for purposes of analysis to produce

original research results

http://www.docs.is.ed.ac.uk/docs/data-library/EUDL_RDM_Handbook.pdf

Page 5: Research Data Management and Librarians

What is research data management?

• “the process of controlling the information generated

during a research project”

• “Managing data is an integral part of the research

process. How data is managed depends on the types

of data involved, how data is collected and stored,

and how it is used - throughout the research

lifecycle”.

http://www.libraries.psu.edu/psul/pubcur/what_is_dm.html

Page 6: Research Data Management and Librarians

Why Manage Research Data?

• Meet funding body grant requirements, e.g. NSF, NIH;• Meet publisher requirements • Ensure research integrity and replication; • Ensure research data and records are accurate, complete,

authentic and reliable; • Increase your research efficiency; • Save time and resources in the long run; • Enhance data security and minimise the risk of data loss; • Prevent duplication of effort by enabling others to use your data; • Comply with practices conducted in industry and commerce; and • Protect your institution from reputational, financial and legal risk.

By managing research data you will:

Page 7: Research Data Management and Librarians
Page 8: Research Data Management and Librarians

Designing Data Management Plans

A Data Management Plan is “a formal document that outlines what you will do with your data during and after you complete your research” (The University of Virginia Library, 2014).

Data Management Planning Tools:

• Data Management Planning Tool (DMPTool) https://dmptool.org/ (University of California Curation Center of the California Digital Library)• DMPonline tool https://dmponline.dcc.ac.uk/ (Digital Curation Centre, UK)

Creating Data

Librarians can play an advisory role

Page 9: Research Data Management and Librarians

Data Capture/Collection

The action or process of “gathering and measuring information on variables of interest, in an established systematic fashion that enables one to answer stated research questions, test hypotheses, and evaluate outcomes” (Responsible conduct of research, n.d.; The Oxford Dictionary, 2014).

Examples of data collection methods:Observations, textual or visual analysis, interviews, focus group interviews, surveys, tracking, experiments, case studies, literature reviews, questionnaires, data from sensors, model outputs, scenarios, etc.

Creating Data

Librarians can play their traditional role of information searching, - training and - consultation

Page 10: Research Data Management and Librarians

Data Storage and Backup

Data storage is the process of “preservation of data files in a secure location which can be accessed readily” (Research Data Services, University of Wisconsin-Madison, 2014)

Data Backup is the process of “preserving additional copies of your data in a separate physical location from data files in storage”.

Creating Data

Processing Data

Analysing Data

Librarians can advise researchers on File Naming Conventions

Page 11: Research Data Management and Librarians

Metadata Creation

• Metadata is searchable, standardised and structured “information that describes a dataset” and explains “the aim, origin, time references, geographic location, creating author, access conditions and terms of use of a data set”

(Corti et al., 2014: 38; USGS Data Management Website, 2014)

• Examples: - Dublin Core Metadata Element Set; - ISO 19115: 2003(E) — Geographic Information Metadata; - PREMIS

Creating Data

Processing Data

Analysing Data

Preserving Data

Librarians, especially cataloguers have the skill-set to assist with metadata creation and to advise

Page 12: Research Data Management and Librarians

Data Cleansing, Verification & Validation

• Data Cleansing

“refers to identifying incomplete, incorrect, inaccurate, irrelevant, etc. parts of the data and

then replacing, modifying, or deleting this dirty data’ (Wikipedia)

• Data Verification

“the process of evaluating the completeness, correctness, and compliance of a dataset with

required procedures to ensure that the data is what it purports to be. This can be done by persons

“who are less familiar with the data”, for example Librarians.

(Martin and Ballard, 2010: 8-9; US EPA, 2002:7)

• Data validation

process “to determine if data quality goals have been achieved and the reasons for any

deviations. Validation checks that the data makes sense”.

(Martin and Ballard, 2010: 8; US EPA 2002:15).

Processing Data

Analysing Data

Page 13: Research Data Management and Librarians

Data anonymisation

Data anonymisation is “the process of de-identifying sensitive

data, while preserving its format and data type” (Raghunathan, 2013: 4).

Anonymisation Techniques - Examples: Generalisation, Suppression,

Permutation, Pertubation, Substitution, Shuffling, Number and Date

Variance, Nulling-out (Charles, 2012; Cormode and Srivastava, 2009;

Raghunathan ,2013: 172-182; Simpson, n.d.; Vinogradov and Pastsyak,2012: 163).

Processing Data

Analysing Data

Page 14: Research Data Management and Librarians

Data Interpretation & Analysis

Data interpretation and analysis “is the process of assigning

meaning” to the gathered information and ascertaining “the

conclusions, significance, and implications of the findings”

(Analyzing and Interpreting Data, n.d.).

Analysing Data

Page 15: Research Data Management and Librarians

Data Publishing

Data publishing

This is the process of making research data underpinning the findings published in peer-reviewed articles, available for readers and reviewers in an appropriate repository, or “as supplementary materials to a journal publication” (Corti et al 2014: 197; Marques, 2013)

Data Journals

A more recent development has been the appearance of data journals. These journals publish data papers that describe a dataset, and also give an indication in which repository the dataset is available (Corti et al. 2014: 7-8).

Analysing Data

Librarians can be involved in creating and managing a data repository, and can give training and advise

Page 16: Research Data Management and Librarians

Examples of Data Repository Software

Page 17: Research Data Management and Librarians

Registry of Research Data Repositories

• re3data.org is a global registry of research data repositories that

covers research data repositories from different academic

disciplines.

• It presents repositories for the permanent storage and access of

data sets to researchers, funding bodies, publishers and scholarly

institutions.

• It can be used a tool for the easy identification of appropriate data

repositories to store research data.

Page 18: Research Data Management and Librarians

Data Journals• A list of Data Journals – available at http

://proj.badc.rl.ac.uk/preparde/blog/DataJournalsList

• Example of data journal at Elsevier: “Data in Brief”

Page 19: Research Data Management and Librarians

Data Visualisation

Data Visualisation is the visual representation of data, and is used to enable people to both understand and communicate information through graphical and schematic avenues (Friendly, 2009: 2; Schnell and Shetterley, 2013: 3)

Analysing Data

From Xiaoru Yuan’s presentation at CODATA Workshop on 12 June 2014

Page 20: Research Data Management and Librarians

Data Archiving

Data archiving can be described as the process of retention

and storage of valuable data (this is data that will be

essential for future reference) for long-term preservation, so

that the data will be protected from risk (i.e. loss, or

corruption), and will be accessible for future use (Rouse, 2010).

Preserving Data

Page 21: Research Data Management and Librarians

Data Preservation

Data preservation is ”the process of providing enough

representation information, context, metadata, fixity, etc. to the

data so that anyone other than the original data creator can use

and interpret the data” (Ruth Duerr, National  Snow  and Ice Data Center as

cited by Choudhury, 2014)

Preserving Data

The Librarian can assist researchers in preparing data for long-term preservation, by advising on metadata standards

Page 22: Research Data Management and Librarians

Linking Data to research outputs

This is the process of connecting the underlying data relating to a

specific research output, e.g. journal article, thesis, etc to the

research output itself. This can be done by adding a digital object

identifier (DOI) to the dataset and including this in the metadata of

the research output, or by citing the dataset (Callaghan et al., 2013).

Preserving Data

The Librarian can assist researchers, through training and consultation on DOIs and data citation methods

Page 23: Research Data Management and Librarians

Data Sharing

• Sharing data is the process of opening up access to research

data and making it available to other researchers (Corti et al.,

2014: 2).

• Data sharing provides “opportunities for other researchers to

review, confirm or challenge research findings” (Data sharing

and implementation guide, n.d.).

Giving Access to Data

Page 24: Research Data Management and Librarians

Data sharing Methods

The method for sharing data will depend on a variety of factors, including size and complexity of the dataset, sensitivity of the data collected, and anticipated number of requests for data sharing.

Researchers could

(1) Take responsibility for sharing data themselves, or

(2) Use a data archive, or

(3) Use a combination of these methods.

Page 25: Research Data Management and Librarians

Data repurposing/reuse

• This is the process where secondary data (data that have been

captured and analysed by other researchers) can be re-analysed,

reworked or -used for new analyses, and compared with

contemporary data (Corti et al., 2014: 169)

• This process “also enables research where the required data may

be expensive, difficult or impossible to collect”, e.g. large scale

surveys, or historic data (Corti et al., 2014: 169).

Re-using Data

Page 26: Research Data Management and Librarians

Data Citation

Data citation is the process of referencing (attributing and acknowledging) reused data in a similar fashion as traditional sources of information (Corti et al. 2014: 197).

Helpful Sources :

• Publication Manual of the American Psychological Association (APA, 2009)

• Oxford Manual of Style (OUP, 2012)

• Data Citation Awareness Guide (ANDS, 2011)

• Data Citation: What you Need to Know (ESRC, 2012)

Re-using Data

The Librarian can assist researchers, through training and consultation in data citation methods

Page 27: Research Data Management and Librarians

Data Citation: DOI

DOI = Digital Object Identifier

To enable a unique and persistent identification of a digital object

A DOI is a unique alphanumeric string assigned by a registration agency (the International DOI Foundation) to identify a digital object, e.g. a data set. Metadata about the object is stored together with the DOI name. This may include a location, such as a URL, where the object can be found. (Wikipedia)

For example: http://dx.doi.org/10.1000/182

Re-using Data

The Librarian can assist researchers, through training and consultation on DOIs

Specific Object

Registrant

DOI Registry

Page 28: Research Data Management and Librarians

Provenance of Data

• history of a data file or data set

• this includes information

o on the person(s) responsible for the data set

o context of the data set

o revision history, including additions of new data and error corrections (Strasser et al., 2012: 7, 11)

Page 29: Research Data Management and Librarians

Management of Big Data

Big data can be described in terms of its characteristics:

• Relative characteristics: denotes those datasets which cannot be

acquired, managed or processed on common devices within an

acceptable time;

• Abolute chacteristics defines big data through Volume, Variety,

Veracity and Velocity (Huadong, 2014)

Big Data is part of a new science paradigm called Data Intensive

Science, where Scientists are overwhelmed with data sets from many different

sources, e.g. captured by instruments, generated by simulations, and generated

by sensor networks

Page 30: Research Data Management and Librarians

Absolute Characteristics of Big Data

• Volume: The scale of data that systems must ingest, process and

disseminate;

• Variety: the complexity of the types of information handled (many

sources and types of data both structured and unstructured)

• Velocity: the pace at which data flows in and out from sources

like business processes, machines, networks and human

interaction with things like social media sites, mobile devices

• Veracity: refers to the biases, noise and abnormality in data http://inside-bigdata.com/2013/09/12/beyond-volume-variety-velocity-issue-big-data-veracity/

Page 31: Research Data Management and Librarians

Role of Librarian in Big Data

• Create awareness among researchers about Big Data

Initiatives internationally

• Create awareness among colleagues about the activities,

workgroups and task groups of CODATA (Committee on

Data for Science and Technology, of the International

Council for Science) and Research Data Alliance

• Become a member of a number of CODATA task groups

Page 32: Research Data Management and Librarians

Examples of International InitiativesCenter for International Earth Science Information Network, EARTH INSTITUTE, COLUMBIA UNIVERSITY

Computer Network Information Center, CAS

World Data Center for Microorganisms

Institute of Remote Sensing and Digital Earth, CAS

Dept of Earth Sciences

Institute for environment and Human Security

Thetherless World Constellation

International Society for Digital Earth

Page 33: Research Data Management and Librarians

Pilot Projects at University of Pretoria

• The UP Library Services implemented two data management pilot projects in 2013-2014:

• Institute for Cellular and Molecular Medicine (ICMM) and the Neuro-Physio-Group

• An Open Source Document Management System was customised for this purpose

• Why Alfresco?• Open Source

• Captured provenance of data

• Had a versioning function

• Good metadata function

• Easy to integrate with other software

• Workflow function gave supervisor overview of progress of students

• Sync function with dropbox and Google Drive

• Drag and Drop function

• File Sharing function

• Mobile App

Page 34: Research Data Management and Librarians
Page 35: Research Data Management and Librarians
Page 36: Research Data Management and Librarians
Page 37: Research Data Management and Librarians

Long-term Preservation

Archival Information Package (AIP)

• Bagit format (Bag-it and tag-it)

• Bagit “bag” contains:• Bag declaration file, manifest file, data files,

metadata file (XML)• METS wrapper• Dublin Core and MODS(Descriptive Metadata)• PREMIS (Preservation Metadata)

Next Phase

Page 38: Research Data Management and Librarians

Various stakeholders in RDM

Executive Management

Deans & Dept Heads

IT Services

Research Office

Principal Investigator/Re

searcher

LibraryFunders

Publishers

External (disciplinary)

data repositories

(De Waard, Rotman and Lauruhn, 2014)

Page 39: Research Data Management and Librarians

Funders Funders Requirements: USA

(Dietrich et al. 2012)

http://www.istl.org/12-summer/refereed1.html

Page 41: Research Data Management and Librarians

Conclusion

This presentation showed that although the RDM environment looks

daunting the Library Professional can play an essential and much

needed role in determining the success of Research Data

Management initiatives at Higher Education Institutions.

This vast, untamed and complex environment is waiting for someone

to conquer it. Librarians have the necessary skillset to do that.

May this motto also become our victory cry:

“Veni, vidi, vici” – I came I saw I conquered

Page 42: Research Data Management and Librarians

References

• Analyzing and interpreting data. Syracuse, NY: Office of Institutional Research and Assessment, Syracuse University, n.d. [Online] available at https://oira.syr.edu/assessment/assesspp/Analyze.htm (Accessed 18 September 2014).

• CALLAGHAN, S. et al. 2013. Connecting data repositories and publishers for data publication. Presentation delivered on 7 February 2013 at the OpenAIRE Interoperability workshop, University of Minho, held 7-8 February 2013. Braga, Portugal: University of Minho Gualtar Campus. [Online] available at http://openaccess.sdum.uminho.pt/wp-content/uploads/2013/02/7_SarahCallaghan_OpenAIREworkshopUMinho.pdf (Accessed 19 September 2014).

• CHARLES, K. 2012. Comparing enterprise data anonymization techniques. Newton, MA: TechTarget. [Online] available at http://searchsecurity.techtarget.com/tip/Comparing-enterprise-data-anonymization-techniques (Accessed 18 September 2014).

Page 43: Research Data Management and Librarians

References• CHOUDHURY, S. 2014. Public Institution perspective (Research Library). Presented at

the Digital Media Analysis, Search and Management (DMASM), 2014. [Online] available at http://dataconservancy.org/wp-content/uploads/2014/03/DC_DMASM_2014.pdf (Accessed 24 September 2014).

• CORMODE, G. AND SRIVASTAVA, D. 2009. Anonymized data: generation, models, usage. Tutorial at SIGMOD, July 2009. [Online] available at http://dimacs.rutgers.edu/~graham/pubs/papers/anontut.pdf (Accessed 17 September 2014).

• CORTI, L. et al. 2014. Managing and sharing research data: a guide to good practice. Los Angeles: SAGE.

• Data Management Planning Tool (DMPTool). Oakland, CA: University of California Curation Center of the California Digital Library, 2014. [Online] available at https://dmptool.org/ (Accessed 24 September 2014).

• Data sharing and implementation guide. Washington, DC: Institute of Education Sciences, U.S. Department of Education, n.d. [Online] available at http://ies.ed.gov/funding/datasharing_implementation.asp (Accessed 19 September 2014).

Page 44: Research Data Management and Librarians

References• DCC. 2014. Overview of funders data policies. Edinburgh, UK: Digital Curation Centre. [Online]

available at http://www.dcc.ac.uk/resources/policy-and-legal/overview-funders-data-policies (Accessed 24 September 2014).

• DCC. 2014. What are metadata standards? Edinburgh, UK: Digital Curation Centre. [Online] available at http://www.dcc.ac.uk/resources/briefing-papers/standards-watch-papers/what-are-metadata-standards (Accessed 24 September 2014).

• DE WAARD, A. AND ROTMAN, D. AND LAURUHN, M. 2014. Research data management at institutions: part 1: visions. Elsevier Library Connect, 6 February 2014. [Online] available at http://libraryconnect.elsevier.com/articles/2014-02/research-data-management-institutions-part-1-visions (Accessed 5 October 2014)

• DIETRICH, D. et al. 2012. De-mystifying the data management requirements of research funders. Issues in Science and Technology Librarianship, Summer, 2012, No. 70. [Online] available at http://www.istl.org/12-summer/refereed1.html (Accessed 22 September 2012).

Page 45: Research Data Management and Librarians

References• DMPonline tool. Edinburgh, UK: Digital Curation Centre, 2014. [Online]

available at https://dmponline.dcc.ac.uk/ (Accessed 22 September 2014).

• Edinburgh University Data Library Research Data Management Handbook, v.10, Aug, 2011. [Online] available at http://www.docs.is.ed.ac.uk/docs/data-library/EUDL_RDM_Handbook.pdf (Accessed 25 September 2014).

• FRIENDLY, M. 2009. Milestones in the history of thematic cartography, statistical graphics, and data visualization. [Sl.: s.n.] [Online] available at http://www.math.yorku.ca/SCS/Gallery/milestone/milestone.pdf (Accessed 19 September 2014).

• HODSON, S. 2014. Global collaboration in data science: an introduction to CODATA. Presentation on 6 June 2014 at CODATA International Training Workshop in Big Data for Science for Researchers from Emerging and Developing Countries, Beijing, China, 4-20 June 2014.

Page 46: Research Data Management and Librarians

References• HUADONG, G. 2014. Scientific Big Data for knowledge discovery.

Presentation on 8 June 2014 at the CODATA Workshop on Big Data for International Scientific Programmes: Challenges and Opportunities, Beijing, China, 8-9 June 2014.

• Library of Congress. 2014. PREMIS. Washington, DC: Library of Congress. [Online] available at http://www.loc.gov/standards/premis/ (Accessed 25 September 2014).

• A list of data journals. Trac Integrated SCM and Project Management. [Online] available at http://proj.badc.rl.ac.uk/preparde/blog/DataJournalsList (Accessed 25 September 2014).

• MARTIN, E. AND BALLARD, G. 2010. Data management best practices and standards for Biodiversity data applicable to Bird Monitoring Data. U.S. North American Bird Conservation Initiative Monitoring Subcommittee. [Online] available at http://www.nabci-us.org/ (Accessed 24 September 2014).

Page 47: Research Data Management and Librarians

References• MARQUES, D. 2013. Research data driving new services. Elsevier Library

Connect, 25 February 2013. [Online] available at http://libraryconnect.elsevier.com/articles/best-practices/2013-02/research-data-driving-new-services (Accessed 5 October 2014)

• NORMANDIEU, K. 2013. Beyond Volume, Variety and Velocity is the Issue of Big Data Veracity. Inside Big Data. [Online] available at http://inside-bigdata.com/2013/09/12/beyond-volume-variety-velocity-issue-big-data-veracity/ (Accessed 25 September 2014).

• The Oxford Dictionary. [sl.]: Oxford University Press, 2014. [Online] available at http://www.oxforddictionaries.com/us/ (Accessed 16 September 2014).

• RAGHUNATHAN, B. 2013. The complete book of data anonymization: from planning to implementation. Broken Sound Parkway, NW: CRC Press, Taylor and Francis Group.

Page 48: Research Data Management and Librarians

References

• Research Data Services, University of Wisconsin-Madison. Madison, WI: University of Wisconsin Madison, 201. [Online] available at http://researchdata.wisc.edu/manage-your-data/data-backup-and-integrity/ (Accessed 24 September 2014).

• Responsible conduct of research. DeKalb, Illinois: Northern Illinois University Faculty Development and Instructional Design Center, n.d. [Online] available at: http://ori.dhhs.gov/education/products/n_illinois_u/datamanagement/dctopic.html. (Accessed: 16 September 2014).

• ROUSE, M. 2010. Data archiving. Techtarget. [Online] available at http://searchdatabackup.techtarget.com/definition/data-archiving (Accessed 19 August 2014)

• SCHNELL, K. AND SHETTERLEY, N. 2013. Understanding data visualization. [Sl.]: Accenture. [Online] available at http://www.accenture.com/SiteCollectionDocuments/PDF/Accenture-Tech-Labs-Data-Visualization-Full-Paper.pdf (Accessed 19 September 2014).

Page 49: Research Data Management and Librarians

References

• SIMPSON, J. n.d. Data Masking and Encryption Are Different. IRI Blog Articles. [Online] available at http://www.iri.com/blog/data-protection/data-masking-and-data-encryption-are-not-the-same-things/ (Accessed 18 September 2014).

• STRASSER, C. et al. 2012. Primer on data management: what you always wanted to know. [Albuquerque,NM]: DataONE, [University of New Mexico], p1-11. [Online] available at http://www.dataone.org/sites/all/documents/DataONE_BP_Primer_020212.pdf (Accessed 28 August 2013)

• United States Environmental Protection Agency (US EPA). 2002. Guidance on Environmental Data Verification and Data Validation: EPA QA/G-8. Washington, DC: Environmental Protection Agency. [Online] available at http://www.epa.gov/QUALITY/qs-docs/g8-final.pdf (Accessed 24 September 2014).

• USGS Data Management. [Online] available at http://www.usgs.gov/datamanagement/describe/metadata.php (Accessed 19 August 2014).

Page 50: Research Data Management and Librarians

References

• VINOGRADOV, S AND PASTSYAK, A. 2012. Evaluation of data anonymization tools. In: DBKDA 2012 : The Fourth International Conference on Advances in Databases, Knowledge, and Data Applications, held 29 February-5 March, 2012, Reunion Island. Wilmington, DE: International Academy, Research, and Industry Association (IARIA).

• What is data management? University Park, PA: Publishing and Curation Services, Penn State University Libraries, 2014. [Online] available at http://www.libraries.psu.edu/psul/pubcur/what_is_dm.html (Accessed 25 September 2014).

• YUAN, X. 2014. Visualization and visual analytics. Presentation 0n 12 June 2014 at CODATA International Training Workshop in Big Data for Science for Researchers from Emerging and Developing Countries, Beijing, China, 4-20 June 2014.


Recommended