Challenges for libraries in data curation
Consol GarciaLibrarian, Biblioteca Campus del Baix LlobregatUniversitat Politècnica de Catalunya- Barcelona Tech
PKP 2011, Berlin
PKP Conference, Berlin 2011
Challenges for libraries in data curation
Something about data:why data?why open?why sharing data?
Somethingabout libraries,What we need todo, what challenges do we face, opportunities, social network
What's going on? European Research Council, JISC, SURF, Germany
PKP Conference, Berlin 2011
Data, Linked data, Linked Open data✔ Because computationally intensive science is
being carried out
✔ This is a data society, due to e-science (new methodology emerging from broadband communications networks, software and infrastructure)
✔ Web of data intended to enable computers to understand the semantics (…)
PKP Conference, Berlin 2011
Open Data / Open AccessRegarding data...
✔ Are we where oa was 10 years ago✔ Both share goals✔ Same stakeholders and workteams
✔ Biomed, Wellcome trust, scholarly publiching, researchers, ...
✔ Movements that benefit one from each other
In Berlin declaration there's mention to metadata, raw data and other materials
PKP Conference, Berlin 2011
Sharing data
✔ It's much more than share✔ It's deposition, preservation,✔ It's access, use and reuse
Data Life cycle
✔ It's much more than life span✔ It's a cycle that properly managed will enable
access, evaluations and re-use over time
PKP Conference, Berlin 2011
Data Life cycle
PKP Conference, Berlin 2011
Why share data?
PKP Conference, Berlin 2011
Why share data?
✔ To verify data✔ To retain data integrity✔ PARSE study 98% if research is publicly
funded, the results should become public property and be properly preserved
PKP Conference, Berlin 2011
Why not to share it?
✔ Researchers want to use their results as intellectual capital
✔ Researchers can sell their data✔ It takes time, effort and money✔ No data standards within a discipline✔ Idiosyncratic research practices
PKP Conference, Berlin 2011
What's important? Technical perspective
PKP Conference, Berlin 2011
What's important? Cultural barriers
✔ Scientist must be aware on data management
✔ Changing the culture of science from publications to data
✔ Ensure proper citation (technology will help)
✔ Social tools have a great potencial to speed up scientific discovery
PKP Conference, Berlin 2011
What needs to happen?
✔ Build an infrastructure✔ is kind of everybody's problem, and
therefore it's nobody's problem, Boyle)
✔ Design good online tools✔ understand how science works)
✔ Create cultural change✔ Top down strategy (open access movement)✔ Bottom-up (how to measure contributions)
Arxiv and SPIRES
PKP Conference, Berlin 2011
When will it happen?
✔ When researchers find it useful✔ When researchers get credit to do it✔ When funders require it✔ When publishers require/find it useful✔ When the recommendations and polices are checked
PKP Conference, Berlin 2011
When should it be done?
✔ Before publication✔ Human Genome Project✔ Arxiv.org
✔ After publication✔ NIH✔ PANGEA
PKP Conference, Berlin 2011
Examples✔ Protein Data Bank (prepublication)✔ NCBI: GenBank✔ Sloan Digital Sky Project✔ PLOS and PMC (editors)✔ Arxiv.org (repositories)✔ DataONE (DataObservation Network for Earth) under
NSF DataNet programme✔ PANGEA
✔ a lot of work but still technical, legal and cultural barriers
PKP Conference, Berlin 2011
Where to begin: roadmap✔ Self-assestment / Data audit
✔ Digital Curation Center checklist✔ Should provide guidance on different:
✔ Data producers (quality of data)✔ Data users (fair use)✔ Funding agencies (mandate data)✔ Repositories (storage, preservation of data,
DRAMBORA)
✔ On the requirements in the international projects
PKP Conference, Berlin 2011
What's going on?✔ Carlos Morais Pires e-infrastructures and scientific data
✔ Some EU projects aiming for enviromental related data ENV 2012.6
✔ JISC MRD Programme✔ University of Edimburg has a policy plan for RDM
✔ PANGEA funded by German research council in 2010
✔ Seal of Approval (DANS)✔ Spain is just beginning
PKP Conference, Berlin 2011
Librarian's role✔ Increasing volume & types of data✔ Subject librarian could be the curator✔ Should have skills and knowledge on:
✔ Scientific research✔ Operating systems✔ Database management systems✔ Scripting languages
✔ Some functions:✔ Streamline submission to databases✔ Automate curation✔ Standarize data✔ Facilitate contributions to annotation✔ Editing and teaching?
PKP Conference, Berlin 2011
Librarian's role✔ Self-teaching, on-job experience✔ Training at LIS:
✔ University of Illinois at Urbana-Champaign✔ Digital Curation Center DC101
✔ Journal and conferences✔ Could help with
✔ Metadata✔ Copyright✔ Advocacy✔ Archiving and long-term preservation✔ Citing data
PKP Conference, Berlin 2011
Conclusions
✔ Librarians could/will be involved at any stage in the research process and collect the pieces
✔ Deep partnership between library and researchers is necessary
✔ Focus on small-scale solutions✔ Be aware of metadata schemas and vocabularies within a discipline
✔ Librarians and researchers are still learning how to manage research data
PKP Conference, Berlin 2011
Conclusions✔ Different approaches attending to:
✔ funding agencies,✔ subject disciplines
✔ Physics✔ Meteorology✔ Astronomy✔ Life science
✔ world region✔ age of researchers
✔ 50% of the respondents from the Tenopir et al. survey reported that neither the organization or the project provide funds to manage data
PKP Conference, Berlin 2011
Conclusions
✔ No data standards within a discipline
✔ Complexity among data objects✔ Some communities are willing to share but there's no data center where to send the data
✔ Some times the problem relies on the quantity and quality of data
✔ Mandates flexibility
PKP Conference, Berlin 2011
Thank you!