+ All Categories
Home > Education > Software Citation and a Proposal (NSF workshop at Havard Medical School)

Software Citation and a Proposal (NSF workshop at Havard Medical School)

Date post: 13-Feb-2017
Category:
Upload: james-howison
View: 318 times
Download: 0 times
Share this document with a friend
30
Software in the scientific literature: Software mentions and a provocative proposal James Howison Information School University of Texas at Austin This material is based upon work supported by the National Science Foundation under Grant No. SMA- 1064209. @jameshowison
Transcript
Page 1: Software Citation and a Proposal (NSF workshop at Havard Medical School)

@jameshowison

Software in the scientific literature:

Software mentions and a provocative proposal

James Howison Information School

University of Texas at AustinThis material is based upon work supported by the National

Science Foundation under Grant No. SMA-1064209.

Page 2: Software Citation and a Proposal (NSF workshop at Havard Medical School)

What does a citation do, anyway?

• Gives credit for contribution– A key reward that drives activity in science– Sits alongside publications, grants, promotions,

and prizes– Rewards drive type of artifacts and collaboration

• Explains the method used– Citations assist in knowing what was done– Provenance– Replication and extension

@jameshowison

Page 3: Software Citation and a Proposal (NSF workshop at Havard Medical School)

How problematic are current practices?

• How is software mentioned in papers?• How accessible and reusable is the software

mentioned?• How well do these mentions perform the functions of

citation?github.com/jameshowison/softciteDOI: 10.6084/m9.figshare.1146366

Howison, J., & Bullard, J. (2015). Software in the scientific literature: Problems with seeing, finding, and using software mentioned in the biology literature. Journal of the Association for

Information Science and Technology (JASIST), doi: 10.1002/asi.23538

@jameshowison

Page 4: Software Citation and a Proposal (NSF workshop at Havard Medical School)

Sample and Method

• 90 randomly selected articles from biology literature, articles published between 2000 and 2010.

• Journals stratified across Journal Impact Factor to balance coverage with influence

@jameshowison

Page 5: Software Citation and a Proposal (NSF workshop at Havard Medical School)

Content analysis scheme

Manual content analysis (3 coders, Kappa)1. Identifying mentions– Read article, locate a mention of a piece of software

2. Identify in-text characteristics of mention– Name of software? URL? Date? Version number? In

bibliography? Cite to paper/manual/webpage?3. Functions of mention– Identifiable? Findable? Accessible? Source? Match

preferred citation?

@jameshowison

Page 6: Software Citation and a Proposal (NSF workshop at Havard Medical School)

https://github.com/jameshowison/softcite/blob/master/data/software-citation-coding.ttl

@jameshowison

Page 7: Software Citation and a Proposal (NSF workshop at Havard Medical School)

How many mentions?

• 59 articles mentioned software, 31 did not.• There were 286 distinct mentions of software.• Those mentions were to 146 distinct pieces of

software.– This includes general purpose (e.g., Microsoft

Excel) and science-specific software (e.g., DENZO, BLAST).

@jameshowison

Page 8: Software Citation and a Proposal (NSF workshop at Havard Medical School)

Types of mentionsMention Type Example

Cite to Publication … was calculated using biosys (Swofford & Selander 1981).

Cite to Project Name or Website

… using the program Autodecay version 4.0.29 PPC (Eriksson 1998).Reference List has: ERIKSSON, T. 1998. Autodecay, vers. 4.0.29 Stockholm: Department of Botany.

Like Instrument … calculated by t-test using the Prism 3.0 software (GraphPad Software, San Diego, CA, USA).

URL in text … freely available from http://www.cibiv.at/software/pda/ .

In-text name mention only

… were analyzed using MapQTL (4.0) software.

Not even name mentioned

… was carried out using software implemented in the Java programming language.

@jameshowison

Page 9: Software Citation and a Proposal (NSF workshop at Havard Medical School)

Types of Mentions

@jameshowison

Page 10: Software Citation and a Proposal (NSF workshop at Havard Medical School)

Simpler Mention Kinds

@jameshowison

Page 11: Software Citation and a Proposal (NSF workshop at Havard Medical School)

By Strata?

@jameshowison

Page 12: Software Citation and a Proposal (NSF workshop at Havard Medical School)

What sort of software mentioned?

@jameshowison

Page 13: Software Citation and a Proposal (NSF workshop at Havard Medical School)

Proprietary software more likely to be mentioned “like instrument”

@jameshowison

Page 14: Software Citation and a Proposal (NSF workshop at Havard Medical School)

How useful are these mentions?

@jameshowison

Page 15: Software Citation and a Proposal (NSF workshop at Havard Medical School)

Not much change across strata

@jameshowison

Page 16: Software Citation and a Proposal (NSF workshop at Havard Medical School)

Do mention types work differently?

@jameshowison

Page 17: Software Citation and a Proposal (NSF workshop at Havard Medical School)

Other findings

• Only 24% journals had policies that mentioned software, declining by strata.– Rarely mention versions.– Not clear that these are followed.

• Only between 13–30% of packages make a specific request for a particular type of citation– 32% of mentions didn’t follow the citation.

@jameshowison

Page 18: Software Citation and a Proposal (NSF workshop at Havard Medical School)

Visible citation formats as “nudge”

• Some disagreement about how important the text of a publication is:– Should effort focus on machine readable “meta-data”

in publication repositories (not in paper)?– Or focus on human readable formats in the paper?

• My position is that human readable will influence practice more quickly

• Formal, well-structured formats and policies act as a “nudge” to shape how authors mention software.

@jameshowison

Page 19: Software Citation and a Proposal (NSF workshop at Havard Medical School)

Software archiving

• Strong finding that many pieces of software were not findable.– 1 in 10 packages could not be found at all– Only 1 in 20 packages could the specific version be found

(combination of missing version info and missing versions online)

• Analogous to link-rot for URLs in publications (Koehler, 1999)

• Need to influence how software is archived– Is that a role for publishers? Escrow for non-open software?

@jameshowison

Page 20: Software Citation and a Proposal (NSF workshop at Havard Medical School)

Part 2

But what are we working to incentivize anyway?

@jameshowison

Page 21: Software Citation and a Proposal (NSF workshop at Havard Medical School)

@jameshowison

Howison, J., & Herbsleb, J. D. (2013). Incentives and integration in scientific software production. In Proceedings of the ACM Conference on Computer Supported Cooperative Work (pp. 459–470). San Antonio, TX.

Page 22: Software Citation and a Proposal (NSF workshop at Havard Medical School)

Citation and collaboration

• What is the impact on collaboration of credit-giving through citations?

• Can a citation (of any kind) incentivize an ongoing collaboration able to do the work needed to keep a piece of software scientifically functional?

• Could a standard undermine collaboration further?

@jameshowison

Page 23: Software Citation and a Proposal (NSF workshop at Havard Medical School)

Can citation incentivize maintenance?

• Software relies on other software– Dependencies all the way down– Software stacks change quickly (new opportunities, new

problems, new libraries)• Scientists seek to extend the work of others, not just

re-execute it.• Many re-implementations come from frustration with

poorly maintained software– Software that wasn’t adjusted as its dependencies changed– Software that wasn’t updated with newer techniques

@jameshowison

Page 24: Software Citation and a Proposal (NSF workshop at Havard Medical School)

A modest proposal

1. Papers have full workflow available2. Workflows have regression tests running on a

continuous integration system3. Integration system pulls all new versions of

dependencies, executes regression tests.4. On fail (build or tests) the paper is retracted.

@jameshowison

Howison, J. (2014). Retract bit-rotten publications: Aligning incentives for sustaining scientific software. In Working towards Sustainable Software for Science: Practice and Experiences (SuperComputing 2014 Workshop). New Orleans.

Page 25: Software Citation and a Proposal (NSF workshop at Havard Medical School)

Uh …

• Retraction too strong, you say?

Ok, let’s revisit step 4:• On fail, the paper is marked “provisionally

non-extendable” and authors have some period to fix before marked as “retired”.

@jameshowison

Page 26: Software Citation and a Proposal (NSF workshop at Havard Medical School)

Could others fix papers?

• Why must the original authors be the ones to fix maintenance issues?– Attract new resources, motivate integration.

• Re-write Step 4 again:– On fail, workflow is marked as “needing work”– Anyone can contribute that work• Those extending the work, grad students, citizen

scientists– Anyone that succeeds is added as an Author

@jameshowison

Page 27: Software Citation and a Proposal (NSF workshop at Havard Medical School)

Added as an author??!?

• Just for fixing a bug?Ok, fine. Let’s re-write the second half of step 4 again:– Anyone maintaining a workflow and returning a

publication to full extendable status is:• Added to paper as an acknowledgement• Invited to a conference, Given a prize• Credited in a visible, public, system (think github

profile)

@jameshowison

Page 28: Software Citation and a Proposal (NSF workshop at Havard Medical School)

Takeaways

• Software citation is diverse and fails functions:– “Like instrument” and “cite to publication” citations

give credit but fail to provide version information– Other, informal mentions, better at versions but often

fail to give credit• Software is frequently inaccessible• Collaboration is counter-motivated by publication• Bit-rotten papers should create opportunities to

earn reputation for scientific contribution.

@jameshowison

Page 29: Software Citation and a Proposal (NSF workshop at Havard Medical School)

@jameshowison

Extras

Page 30: Software Citation and a Proposal (NSF workshop at Havard Medical School)

@jameshowison

Software packages found


Recommended