Reproducibility as a Service

Abstract

Recent studies demonstrated that the reproducibility of previously published computational experiments is inadequate. Many of these published computational experiments never recorded or preserved their computational environment, including packages installed in the language, libraries installed on the host system, and file locations. Researchers have created reproducibility tools to help mitigate this problem, but these tools assume the experiment currently executes. Thus, these tools do not facilitate reproducibility of the large number of published experiments. This situation is not improving; researchers continue to publish without using reproducibility tools. We define a framework to distinguish between actions taken by a researcher to facilitate reproducibility in the presence of a computational environment and actions taken by a researcher to enable reproduction of an experiment when that environment has been lost to clarify the gap between what existing reproducibility tools are capable of and what is required to reproduce published experiments. The difference between these approaches lies in the availability of a computational environment. Researchers that provide access to the original computational environment perform proactive reproducibility, while those who do not enable only retroactive reproducibility. We present Reproducibility as a Service (RaaS), which is, to the best of our knowledge, the first reproducibility tool explicitly designed to facilitate retroactive reproducibility. We demonstrate how RaaS fixes many common errors found in R scripts on Harvard’s Dataverse and preserves a recreated computational environment. Finally, we discuss how a retroactive reproducibility service such as RaaS is also helpful as an ‘artifact evaluation assistant’ in a journal’s publication pipeline.

Publication
Software: Practice and Experience