+ All Categories
Home > Science > Reuse for research, presentation, idcc17

Reuse for research, presentation, idcc17

Date post: 12-Apr-2017
Category:
Upload: michael-svendsen
View: 17 times
Download: 2 times
Share this document with a friend
16
Reuse for Research Curating Astrophysical Datasets for Future Researchers Practice Paper, IDCC17 Anders Conrad, Royal Danish Library Michael Svendsen, Royal Danish Library Rasmus Handberg, Aarhus University
Transcript
Page 1: Reuse for research, presentation, idcc17

Reuse for Research

Curating Astrophysical Datasets for Future Researchers

Practice Paper, IDCC17

Anders Conrad, Royal Danish LibraryMichael Svendsen, Royal Danish Library

Rasmus Handberg, Aarhus University

Page 2: Reuse for research, presentation, idcc17

The NASA Kepler/K2 Mission

Read about the mission at https://kepler.nasa.gov/Mission/QuickGuide/

Page 3: Reuse for research, presentation, idcc17

The Kepler Photometer

Page 4: Reuse for research, presentation, idcc17

From Space to Aarhus…

Spacecraft

Deep Space Network

NASA MAST archiveKASOC archive, Aarhus

KASC scientists/ working groups

KASOC website (kasoc.phys.au.dk)

Page 5: Reuse for research, presentation, idcc17

The challenge - Where Next?•Data will remain valuable for active research for at least 50 years!

•Who will take care when the current research organisation (Kepler Asteroseismic Science Consortium, KASC) does no longer exist?

•How can data be kept accessible for continued active research?

Page 6: Reuse for research, presentation, idcc17

KASC requirements for a Living Archive

•Available for 50 years •Always freely available on-line•Continue to be used for active research

•Extendable: New information can be added

•Formats must be readable by both humans and computers

•Understandable and useful for future researchers – no matter the science case

Page 7: Reuse for research, presentation, idcc17

Future workshops - Reuse for Research

•For which research questions might future researchers find this data useful?

•How would they most likely want to see data packaged?

•What documentation is needed to understand data outside the current context?

•What search criteria would most likely be used to discover data?

Page 8: Reuse for research, presentation, idcc17

The 50 Years Issue•Institutionally: Who can offer more than 5-10 years of storage and preservation?

•Financially: Who will pay?

•Technically: How will data remain readable and understandable?

•Scientifically: How will data remain useful and trustworthy?

Page 9: Reuse for research, presentation, idcc17

From ”Who” and ”How” to…•How to best

•Structure datasets in a way that is most useful for research

•Use formats that are suitable for long-term preservation

•Secure sufficient contextual and specific documentation for scientific reuse

•Facilitate cross-institutional collaboration, to provide a sustainable service

•Secure access and discoverability according to scientific needs

•Secure possibility for continued deposit

Page 10: Reuse for research, presentation, idcc17

Dataset Structure

•One self-containing dataset for each star•5 different types of data products•Dataset-specific documentation•TOC file (machine and human readable)

•References to publications (bibcodes)

•One generic documentation package•E.g. NASA and KASC release notes

Page 11: Reuse for research, presentation, idcc17

One BagIt Archive for Each Star Kepler_10.zip

│ bag-info.txt │ bagit.txt │ fetch.txt │ manifest-sha1.txt└───data │ bundle.xml │ readme.txt ├───datafiles │ └───... ├───additional_files │ └───... ├───documentation │ └───... └───stellar_models └───...

Page 12: Reuse for research, presentation, idcc17

Documentation for Each Dataset

<star kic="12345678"> <numax value="3100" error="20" unit="uHz" /> <mass value="1.0" error="0.01" unit="solar" /> <radius value="1.0" error="0.01" unit="solar" /> <datafiles> <datafile uid=”1” path=”datafiles/original/kplr12345678_llc.fits” /> <datafile uid=”2” path=”datafiles/kasoc.ts/kplr12345678_kasoc.ts.fits”> <dependency datafile=”1” /> </datafile> … </datafiles> <model path=”stellar_models/kic12345678/” /></star>

● The bundle.xml file

Page 13: Reuse for research, presentation, idcc17

Proof-of-concept - Repository Setup•Using Dataverse repository software

•Support for astrophysics metadata•Discoverability and citability (Datacite

DOI’s)•API’s for automatic ingest workflow•Versioning – allowing redeposit of

extended versions of datasets•Issues:

•Missing numeric fields for celestial coordinates (for discovery)

•Limited options for mapping to external storage (we use erda.dk)

Page 14: Reuse for research, presentation, idcc17

Institutional Collaboration

Page 15: Reuse for research, presentation, idcc17

Conclusions – as of February 2017•Data packages designed in a way that can outlive repository software•Caveat: may imply limitations in the use of repository features

•Preservation actions will potentially be possible, even if we don’t plan them

•We still work on establishing funding and a sustainable business model

•We need to establish a production environment for repository

Page 16: Reuse for research, presentation, idcc17

Reuse for Research

Contact: Michael Svendsen, @tullemich, Royal Danish Library


Recommended