JISC-CNI 2014 ©UC Regents, 2014
1July 10, 2014
Opening Up DataMacKenzie SmithUniversity LibrarianUniversity of California, Davis
JISC-CNI 2014 ©UC Regents, 2014
2July 10, 2014
JISC-CNI 2014 ©UC Regents, 2014
3
At Creative Commons, we believe scientific data should be freely available to everyone. We call this idea Open Data. Creative Commons legal tools can be used to make data and databases freely available. We’ve already had successful implementations in taxonomic, energy, genomics, disease research, geospatial, polar, and bibliometric disciplines, and are providing guidance to funders, institutions, private foundations, governments, the corporate sector, and other stakeholders. Read more about Creative Commons and data.
July 10, 2014
JISC-CNI 2014 ©UC Regents, 2014
4
U.S. Funding Agency PolicyNIH (2003): “The NIH expects and supports the timely release and sharing of final research data from NIH-supported studies for use by other researchers.” (>$500,000, include data sharing plan)
NSF grant guidelines: “NSF ... expects investigators to share with other researchers, at no more than incremental cost and within a reasonable time, the data, samples, physical collections and other supporting materials created or gathered in the course of the work. It also encourages grantees to share software and inventions or otherwise act to make the innovations they embody widely useful and usable.” (2005 and earlier)
NSF peer-reviewed Data Management Plan (DMP), January 2011
July 10, 2014
©UC Regents, 2014 5
Credibility Crisis?
3/13/2014
©UC Regents, 2014 7
Journal Data Sharing Policy 2011 2012
Required as condition of publication, barring exceptions
Required but may not affect editorial decisions
Encouraged/addressed, may be reviewed and/or hosted
Implied
No mention
10.6% 11.2%
1.7% 5.9%
20.6% 17.6%
0% 2.9%
67.1% 62.4%
3/13/2014
Source: Stodden, Guo, Ma (2013) PLoS ONE, 8(6)
©UC Regents, 2014 8
Journal Code Sharing Policy 2011 2012
Required as condition of publication, barring exceptions
Required but may not affect editorial decisions
Encouraged/addressed, may be reviewed and/or hosted
Implied
No mention
3.5% 3.5%
3.5% 3.5%
10% 12.4%
0% 1.8%
82.9% 78.8%
3/13/2014
Source: Stodden, Guo, Ma (2013) PLoS ONE, 8(6)
©UC Regents, 2014 9
Software in Scientific Discovery
JASA June• 1996• 2006• 2009• 2011
Computational Articles Code Publicly Available
9 of 20 0%33 of 35 9%32 of 32 16%29 of 29 21%
3/13/2014
JISC-CNI 2014 ©UC Regents, 2014
10
Open Science reaches the White HouseExecutive Memorandum directing federal funding agencies to develop plans for public access to data and publications (Feb 2013)
“data is defined... as the digital recorded factual material commonly accepted in the scientific community as necessary to validate research findings including data sets used to support scholarly publications...”
Executive Order directing federal agencies to make their own data publicly available (May 9)
July 10, 2014
JISC-CNI 2014 ©UC Regents, 2014
11
Current NIH View: Components of anAcademic Digital Enterprise
• Consists of digital assets• Datasets, papers, software, lab notes
• Each asset is uniquely identified and has provenance, including access control• e.g., publishing simply involves changing the access
control
• Digital assets are interoperable across the enterprise
July 10, 2014
JISC-CNI 2014 ©UC Regents, 2014
12
Barriers to Open Data SharingCode Data77% Time to document and clean up 54%52% Dealing with questions from users 34%44% Not receiving attribution 42%40% Possibility of patents -34% Legal Barriers (e.g. copyright) 41% - Time to verify release with admin 38%30% Potential loss of future publications 35%30% Competitors may get an advantage 33%20% Web/disk space limitations 29%
July 10, 2014
Survey of the Machine Learning Community, NIPS (Stodden 2010)
JISC-CNI 2014 ©UC Regents, 2014
13
2014 White House Big Data and Privacy Review
July 10, 2014
Pass National Data Breach Legislation that provides for a single national data breach standard, along the lines of the Administration's 2011 Cybersecurity legislative proposal.
JISC-CNI 2014 ©UC Regents, 2014
15
Higher Education responses• Infrastructure• Developing new tools across the research life cycle • Mostly individual institutions or disciplines• National initiatives emerging (e.g. ARL/AAU/APLU SHARE
initiative)
• Policy• Institutional Open Access policies • SHARE copyright group
• Training• ARL e-science institute• ARL spec kit on RDM activities• Current events
July 10, 2014
©UC Regents, 2014 16
New Tools for Computational Reproducibility
Dissemination Platforms, e.g. DataONE DataVerse RunMyCode.org
Workflow Tracking and Research Environments, e.g. VisTrails Kepler Taverna
Embedded Publishing, e.g. Sweave Knitr VCR (Verifiable Computational Research)
3/13/2014
JISC-CNI 2014 ©UC Regents, 2014
17
Data Repositories• Disciplinary
• ICPSR, Genbank• Dryad, ONEShare• Sage Commons (Sage Bionetworks)
• Displinary/Institutional• DataVerse, Nesstar
• Institutional • IRs galore: e.g., UC’s Dash and Chronopolis, Purdue’s PURR, JHU’s Data
Conservancy, Stanford Digital Repository, many local DSpace/Fedora/Hydra/Islandora instances, Locally run and cloud hosted, locally run and cloud hosted
• Data Centers on every campus
• Generic/cloud• Figshare• DuraCloud
July 10, 2014
JISC-CNI 2014 ©UC Regents, 2014
18
DryadWe continued to refine the infrastructure for linking between articles and data. The web service for returning the corresponding Dryad data DOI when queried with an article DOI is now being used by Elsevier to provide a link to the data from ScienceDirect for 40 different Elsevier journals that have at least one data package in Dryad. Dryad is an international collaborator in the EU-funded ORCID DataCite interoperability Network Project (odinproject.eu), which this past year introduced a tool enabling researchers to add research outputs with DataCite DOIs (such as Dryad data packages) to their ORCID profiles. We also introduced regular updating of linkages between related records in PubMed, Genbank, and EuropePMC to data packages in Dryad. To further promote discoverability and accessibility, Dryad officially became a DataONE Tier 1 member node. Improvements to the curation interface have led to an increase in curation efficiency of greater than 25% in the past year.
July 10, 2014
Dryad Annual Report, 2013
JISC-CNI 2014 ©UC Regents, 2014
19
Dryad: Embargo Usage
Embargo selections of Dryad data authors for the 10,108 files in Dryad deposited by September 20, 2013. Data include only datasets related to articles published in journals for which the authors had the option of selecting an embargo. (B) Longer term embargoes (>1 year) by journal that granted them.
Data Archiving: Suggestions to Increase Participation. PLoS Biol12(1): e1001779doi:10.1371/journal.pbio.10017796
July 10, 2014
JISC-CNI 2014 ©UC Regents, 2014
21July 10, 2014
JISC-CNI 2014 ©UC Regents, 2014
23
Data Sharing: Discoverability
COMINGNIH data catalog (part of the BD2K initiative)
SHARE registry
HERE NOWThomson Reuters Data Citation Index
OCLC WorldShare (includes OAIster)
Google/Google Scholar
July 10, 2014
JISC-CNI 2014 ©UC Regents, 2014
24
Data Sharing: Identifiers
• DOIs for Data (DataCite, CrossRef, EZID)
• ORCIDs for Researchers
• FundRef for funding agencies
• Still missing good institutional identifiers
July 10, 2014
Dealing with Data Rights
• An IP rights strategy, including the promotion of university-based Open Access policies and favorable licensing terms, will be part of the scaffolding that will enable the layers of SHARE to develop.
• Rights subgroup formed to deal with this
• A broad collective action by AAU and APLU – to be discussed with AAU Presidents in April
JISC-CNI 2014 ©UC Regents, 2014
28
Data archiving by library
Data citation support
Other Data Mangement training
DMP consulting
0 10 20 30 40 50 60
40
22
38
42
23
33
48
47 Data management planning
Data management support
Data sharing & archiving
Key finding: RDM Service Offering
ARL SPEC Kit 334: Research Data Management Services (July 2013)http://publications.arl.org/Research-Data-Management-Services-SPEC-Kit-334/
July 10, 2014
JISC-CNI 2014 ©UC Regents, 2014
29
Data management planning
July 10, 2014
DMP training DMP consulting0
10
20
30
40
50
60
89%N = 48
61%N = 33
ARL SPEC Kit 334: Research Data Management Services (July 2013)http://publications.arl.org/Research-Data-Management-Services-SPEC-Kit-334/
JISC-CNI 2014 ©UC Regents, 2014
32
What You Need to Know about Writing Data Management Plans
An ACRL e-Learning Online Course, July 14-August 1, 2014
Description: Demand for data management plan consultants is growing as more granting agencies add this requirement. Most presentations concerning data management do not provide practical advice on how to consult with researchers writing a data management plan for grant submission. This course teaches participants about the elements of a successful data management plan, and provides practice critiquing data management plans in a supportive learning environment where no grant funding is at stake. Join two experienced data management plan consultants with experience in liaison librarianship and information technology as they demonstrate how all librarians have the ability to successfully consult on data management plan. Each week will include assigned readings, a written lecture, discussion questions, weekly assignments, and live chats with the instructors.
Participants will examine how data and metadata are defined, open data formats, dark archives, and secure repositories as well as addressing specialty concerns such as how securely preserve information related to at risk populations, etc. Selection of effective long term data preservation and sharing strategies will also be examined. Lastly, participants will evaluate sample data management plans from the sciences, social sciences, and the arts and humanities as a final project for the course. Critiques of each plan will be presented to the class during the final chat session at the end of the course.
Learning Outcomes:List specific data depository resources in order to formulate recommendations for researchers to securely deposit and share their data.Learn about how different funding agencies, and departments within those agencies, have different requirements for data management plans in order to determine how to effectively advise each researcher according to the requirements for their specific plan.Analyze sample data management plans in order to develop an understanding of what constitutes a thorough data management plan. Presenters: Dee Ann Allison, Professor, University of Nebraska-Lincoln; Kiyomi Deards, Assistant Professor, University of Nebraska-Lincoln
Course Requirements: Your participation will require approximately three to five hours per week of primarily asynchronous activities to:Read the online seminar materialPost to online discussion boardsSynchronous chat sessions (optional)Complete online exercisesComplete a seminar evaluation form
July 10, 2014
JISC-CNI 2014 ©UC Regents, 2014
33
New England Collaborative Data Management Curriculum
July 10, 2014
JISC-CNI 2014 ©UC Regents, 2014
35
CLIR Postdoc Fellowship Program“CLIR Postdoctoral Fellows work on projects that forge and strengthen connections among library collections, educational technologies, and current research. The program offers recent PhD graduates the chance to help develop research tools, resources, and services while exploring new career opportunities. Host institutions benefit from fellows' field-specific expertise by gaining insights into their collections' potential uses and users, scholarly information behaviors, and current teaching and learning practices within particular disciplines.”
• >110 fellows so far
• UC Davis postdoc in neuroscience: Jonathan Cachat
July 10, 2014
JISC-CNI 2014 ©UC Regents, 2014
36
We get it already…
“Painstakingly detailed surveys have been performed across several research organizations, particularly in North America (CLIR; ARL; CDL), Europe (DCC; RIN; NESTA) and Australia (ANDS). The same overall picture emerges:
• Research data is found in a dizzying number of file formats (some proprietary)
• Research data can be digital or non-digital• Lack of metadata & documentation• Data storage is desperate, unorganized, unsecured and
researchers need more space• Researchers welcome help with federal funding mandates (Data
Management Plans)• PIs are not concerned with data sharing preparation – a time
consuming, thankless job in the current publish-or-perish merit system”
July 10, 2014
JISC-CNI 2014 ©UC Regents, 2014
37
Do No Harm
“There is ample evidence of a need for research data management services as provided by reports published from libraries and organizations (cited above). However, it is critical to recognize that sloppy record keeping and the constant, fast-paced strive for bigger, faster, stronger technological infrastructure are inherent to the scientific enterprise. Any services that sterilize or mandate rigid process control may provide solutions to specific data concerns, but will do so at a detriment to science – not an ideal solution”
Amari, Beltrame, Bjaalie, & Dalkara, 2002; Gardner et al., 2003; Kubilius, 2014; Landreth & Silva, 2013; Wallis et al., 2013; White, Baldridge, Brym, Locey, & McGlinn, 2013.
July 10, 2014
JISC-CNI 2014 ©UC Regents, 2014
38
“Mandated changes that are detrimental to the flow rate of a daily research enterprise will not be successful. This challenges the core of research data management, curation and service efforts. It highlights the fact that sometimes efforts to help an external group (e.g., neuroscientists) with internal expertise (e.g., library skill sets), even with the best intentions and solid rational can be unhelpful and unsustainable.”
The problem we are trying to solve is advancing the environmental support and training provided by the university to researchers and students in order to fulfill its mission. Researchers and students are aware of the growing popularity and potential of big data, open data, interdisciplinary data. They desire opportunities, skills and support.
Advancing the environmental support will improve their research, it will improve their education – it gives them an edge, and for that a university is recognized.”
July 10, 2014
JISC-CNI 2014 ©UC Regents, 2014
39
Requires• Less emphasis on infrastructure
• More emphasis on policy• Citation practices in different research disciplines for data,
software• Legal tools for data and software sharing in different
contexts
• Lots more emphasis on training and culture change• Not of librarians, but researchers themselves
July 10, 2014
JISC-CNI 2014 ©UC Regents, 2014
40
Questions?
July 10, 2014