+ All Categories
Home > Education > Software Metadata: Describing "dark software" in GeoSciences

Software Metadata: Describing "dark software" in GeoSciences

Date post: 21-Jan-2017
Category:
Upload: dgarijo
View: 271 times
Download: 1 times
Share this document with a friend
15
1 Yolanda Gil USC Information Sciences Institute [email protected] Software Metadata: Describing “dark software” in Geosciences Yolanda Gil, Daniel Garijo Information Sciences Institute and Department of Computer Science University of Southern California @yolandagil, @dgarijov {gil,dgarijo}@isi.edu http://www.ontosoft.org Building Block
Transcript
Page 1: Software Metadata: Describing "dark software" in GeoSciences

1 Yolanda Gil USC Information Sciences Institute [email protected]

Software Metadata: Describing “dark software” in Geosciences

Yolanda Gil, Daniel Garijo

Information Sciences Institute and Department of Computer Science

University of Southern California @yolandagil, @dgarijov

{gil,dgarijo}@isi.edu

http://www.ontosoft.org Building Block

Page 2: Software Metadata: Describing "dark software" in GeoSciences

2 Yolanda Gil USC Information Sciences Institute [email protected]

We have all been here…

Page 3: Software Metadata: Describing "dark software" in GeoSciences

3 Yolanda Gil USC Information Sciences Institute [email protected]

The Value of Software: Reproducibility

Financial

Human lives

Reliability

Scientific integrity

Financial

Trust

5/ 29/ 15, 1:49 AMRetracted Scientific Studies: A Growing List - NYTimes.com

Page 1 of 8http:/ / www.nytimes.com/ interactive/ 2015/ 05/ 28/ science/ retractions- scientific- studies.html?smid= tw- nytimesscience&_r= 1

Sections Home Search Skip to content

Advertisement

EmailShareTweetMore

Search

Subscribe

Log In 0 Settings

Close search

search sponsored by

Search NYTimes.com

Clear this text input Go

http://nyti.ms/1HPVX1t

1. 1. Study on Attitudes Toward Same-Sex Marriage Is Retracted by a Scientific Journal

2. A Proposal to Modify Plants Gives G.M.O. Debate New Life

3. Chimpanzees in Liberia, Used in New York Blood Center Research, Face Uncertain Future

4. Matter

The Human Family Tree Bristles With New Branches

5. Observatory

Race and Gender Biases Can be Reduced With Sleep Therapy, Study Finds

6. Observatory

Ancient Skull Suggests an Early Murder

7. National Briefing | Washington

Live Anthrax Spores Shipped to Laboratories

8. A Robot That Can Perform Brain Surgery on a Fruit Fly

9. Jinghong Journal

China’s High Hopes for Growing Those Rubber Tree Plants

10. Scientists Warn to Expect More Weather Extremes

11. Arguing in Court Whether 2 Chimps Have the Right to ‘Bodily Liberty’

12. Sister Megan Rice, Freed From Prison, Looks Ahead to More Anti-Nuclear Activism

13. Obama Announces New Rule Limiting Water Pollution

14. Lassa Fever Carries Little Risk to Public, Experts Say

SUBSCRIBE NOW

5/ 29/ 15, 1:49 AMRetracted Scientific Studies: A Growing List - NYTimes.com

Page 5 of 8http:/ / www.nytimes.com/ interactive/ 2015/ 05/ 28/ science/ retractions- scientifi c- studies.html?smid= tw- nytimesscience&_r= 1

The retraction by Science of a study of changing attitudes about gay marriage is

the latest prominent withdrawal of research results from scientific literature.

And it very likely won't be the last. A 2011 study in Nature found a 10-fold

increase in retraction notices during the preceding decade.

Many retractions barely register outside of the scientific field. But in some

instances, the studies that were clawed back made major waves in societal

discussions of the issues they dealt with. This list recounts some prominent

retractions that have occurred since 1980.

Photo

In 1998, The Lancet, a British medical journal,

published a study by Dr. Andrew Wakefield

that suggested that autism in children was

caused by the combined vaccine for measles,

mumps and rubella. In 2010, The Lancet

retracted the study following a review of Dr.

Wakefield's scientific methods and financial

conflicts.

Despite challenges to the study, Dr.

Wakefield's research had a strong effect on

many parents. Vaccination rates tumbled in

Britain, and measles cases grew. American

antivaccine groups also seized on the research. The United States had more

cases of measles in the first month of 2015

than the number that is typically diagnosed in a full year.

Vaccines andAutism

Papers published by Japanese researchers in Nature in 2014 claimed to provide

an easy method to create multipurpose stem cells, with eventual implications

for the treatment of diseases and injuries. Months later, the authors, including

Haruko Obokata, issued a retraction. An investigation by one of Japan's most

prestigious scientific institutes, where much of the research occurred, found

that the author had manipulated some of the images published in the study.

Approximately one month after the retraction, one of Ms. Obokata's co-authors,

Yoshiki Sasai, was found hanging in a stairwell of his office. He had taken his

own life.

Stem Cell Production

Page 4: Software Metadata: Describing "dark software" in GeoSciences

4 Yolanda Gil USC Information Sciences Institute [email protected]

Quantifying the Value of Software through

“Reproducibility Maps” [Bourne & Gil et al 12]

2 months of effort in reproducing published method (in PLoS’10)

Authors expertise was required

Comparison of ligand binding sites

Comparison of dissimilar protein structures

Graph network generation

Molecular Docking

Work with P. Bourne of UCSD

Page 5: Software Metadata: Describing "dark software" in GeoSciences

5 Yolanda Gil USC Information Sciences Institute [email protected]

Geosciences Software Today

There are repositories of model software

There are no shared repositories for other kinds of geosciences software (e.g. model-data preparation services…)

There are general software repositories with no standard metadata

Most scientists are not aware of the value of their software

Most geosciences software is not shared

Page 6: Software Metadata: Describing "dark software" in GeoSciences

6 Yolanda Gil USC Information Sciences Institute [email protected]

“Dark Software”

Models that are not published • Eg from a PhD thesis

Data preparation software • Data pre-processing and

QC can take up to 80% of a project’s effort

Visualization software

“Dark Software” is the counterpart of “Dark Data” [Heidorn 2008]

Page 7: Software Metadata: Describing "dark software" in GeoSciences

7 Yolanda Gil USC Information Sciences Institute [email protected]

Recommender system Interoperability

Publication

Community

Learning

Structured metadata Interactive advice

Best practices Multimedia lessons

Recommender system � Interoperability

Publication

Community

Learning

Structured metadata � Interactive advice

� Best practices � Multimedia lessons

Page 8: Software Metadata: Describing "dark software" in GeoSciences

8 Yolanda Gil USC Information Sciences Institute [email protected]

Publication

Community

Learning

UK Software Institute

Software Carpentry

CIG ESMF

Critical Zone Observatory

Early Career Advisory Board

FES/ESIP

CSDMS

EarthCube RCNs

EarthCube Building Blocks

Recommender system � Interoperability

Publication

Community

Learning

Structured metadata � Interactive advice

� Best practices � Multimedia lessons

Collaborating with SEN C4P EC3

Page 9: Software Metadata: Describing "dark software" in GeoSciences

9 Yolanda Gil USC Information Sciences Institute [email protected]

The OntoSoft Ontology for Describing

Scientific Software Metadata [Gil et al 2015]

An ontology for scientific software metadata

• Intended to describe scientific software

• Designed with scientists in mind to guide them to deposit and describe their software in a software registry

Major categories of metadata: what does a scientist need?

1. identify software

2. understand what it does and its utility for research,

3. execute the software,

4. get support if questions arise,

5. do research with it, and

6. contribute to its development

Page 10: Software Metadata: Describing "dark software" in GeoSciences

10 Yolanda Gil USC Information Sciences Institute [email protected]

OntoSoft Metadata Categories

http://www.ontosoft.org/software

Page 11: Software Metadata: Describing "dark software" in GeoSciences

11 Yolanda Gil USC Information Sciences Institute [email protected]

Describing Scientific Software in OntoSoft

http://www.ontosoft.org/portal

Metadata can be exported

in several formats

(HTML, RDF, JSON)

Metadata for 3DDY Software

Metadata properties

collected through

simple questions

Indicators of metadata

completeness

Set permissions for 3DDY

Metadata properties

organized into categories that

make sense to scientists

Set permission for Documentation metadata for 3DDY software

Crowdsourcing of

metadata through access

control permissions

Automatic import of metadata

from other repositories

Page 12: Software Metadata: Describing "dark software" in GeoSciences

12 Yolanda Gil USC Information Sciences Institute [email protected]

Software entries

from distributed

repositories are

readily accessible

Semantic

search

Comparison matrix

of software entries

PIH M PIH M gis D rEICH TauD EM WBM sed

nto$o%$

Metadata

completion

highlighted

Software is

contrasted

by property

Page 13: Software Metadata: Describing "dark software" in GeoSciences

13 Yolanda Gil USC Information Sciences Institute [email protected]

Recommender system � Interoperability

Publication

Community

Learning

Structured metadata � Interactive advice

� Best practices � Multimedia lessons

Conclusions

Geosciences software is a valuable research product • Must embed best practices of

software sharing into research activities

Improve productivity, quality, reproducibility

OntoSoft contributions • Ontology of scientific

software metadata

• Portal for software registry

• Training scientists to write Geoscience Papers of the Future

Sign up for a GPF training session!

http://www.ontosoft.org

http://www.ontosoft.org/software

http://www.ontosoft.org/portal

http://www.ontosoft.org/gpf

Page 14: Software Metadata: Describing "dark software" in GeoSciences

14 Yolanda Gil USC Information Sciences Institute [email protected]

More Information

http://www.ontosoft.org

http://www.ontosoft.org/software

http://www.ontosoft.org/portal

http://www.ontosoft.org/gpf

OntoSoft: Capturing Scientific Software Metadata. Yolanda Gil, Varun Ratnakar, and Daniel Garijo. Proceedings of the Eighth ACM International Conference on Knowledge Capture (K-CAP), 2015.

OntoSoft: A Distributed Semantic Registry for Scientific Software. Yolanda Gil, Daniel Garijo, Saurabh Mishra, and Varun Ratnakar. Under review, 2016.

DRAT: An Unobtrusive, Scalable Approach to Large Scale Software License Analysis. Chris A. Mattmann, Ji-Hyun Oh, Tyler Palsulich, Lewis John McGibbney, Yolanda Gil, and Varun Ratnakar. Proceedings of the Fourth International Workshop on Software Mining, held in conjunction with the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2015.

Cyber-Innovated Watershed Research at the Shale Hills Critical Zone Observatory. Xuan Yu, Chris Duffy, Yolanda Gil, Lorne Leonard, Gopal Bhatt, and Evan Thomas. IEEE Systems Journal, to appear.

Collaborative Software Development Needs in Geosciences. Yolanda Gil, Eunyoung Moon and James Howison. Proceedings of the Second Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE2), held in conjunction with the IEEE ACM International Conference on High Performance Computing (SC), New Orleans, LA, November 2014.

Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users. Daniel Garijo, Oscar Corcho, Yolanda Gil, Meredith N. Braskie, Derrek Hibar, Xue Hua, Neda Jahanshad and, Paul Thompson and Arthur W. Toga. Proceedings of the IEEE Conference on e-Science, 2014.

FragFlow: Automated Fragment Detection in Scientific Workflows. Daniel Garijo, Oscar Corcho, Yolanda Gil, Boris A. Gutman, Ivo D. Dinov, Paul Thompson and Arthur W. Toga. Proceedings of the IEEE Conference on e-Science, Guarujua, Brazil, October 2014.

An Overview of Mobile Applications for Field Science. Anna Zeng, Kevin Zeng, Yolanda Gil, and Matty Mookerjee. GeoSoft Project Report, September 2014.

The CSDMS Standard Names: Cross-Domain Naming Conventions for Describing Process Models, Data Sets and Their Associated Variables. Scott D. Peckham. Proceedings of the Seventh International Congress on Environmental Modeling and Software, San Diego, CA, June 2014.

Web Applications that Share Level-12 HUC Data and Models of the CONUS. Lorne Leonard and Chris Duffy. Proceedings of the Seventh International Congress on Environmental Modeling and Software, San Diego, CA, June 2014.

Intelligent Workflow Systems and Provenance-Aware Software. Yolanda Gil. Proceedings of the Seventh International Congress on Environmental Modeling and Software, San Diego, CA, June 2014.

Page 15: Software Metadata: Describing "dark software" in GeoSciences

15 Yolanda Gil USC Information Sciences Institute [email protected]

Acknowledgements

The OntoSoft project team includes Chris Duffy (PSU), Chris Mattmann (JPL),

Scott Pechkam (CU), Ji-Hyun Oh (USC), Varun Ratnakar (USC), and Erin

Robinson (ESIP)

The Geoscience Papers of the Future ideas were significantly improved through

input from GPF pioneers Cedric David (JPL), Ibrahim Demir (UI), Bakinam

Essawy (UV), Robinson W. Fulweiler (BU), Jon Goodall (UV), Leif Karlstrom

(UO), Kyo Lee (JPL), Heath Mills (UH), Suzanne Pierce (UT), Allen Pope (CU),

Mimi Tzeng (DISL), Karan Venayagamoorthy (CSU), Sandra Villamizar (UC),

and Xuan Yu (UD)

Thank you to James Howison (UT), Lisa Kempler (Matworks), and Greg Wilson

(Software Carpentry) for their feedback on best practices for software sharing

Thank you to the scientists and other colleagues that have contributed ideas

and asked hard questions about software stewardship

Thank you to the National Science Foundation and the EarthCube program for

supporting this work

EarthCube!ICER-1440323 ICER-1343800

http://www.ontosoft.org

http://www.ontosoft.org/software

http://www.ontosoft.org/portal

http://www.ontosoft.org/gpf


Recommended